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GRANT  #  N00014-95-1-0417 

PRINCIPAL  INVESTIGATOR:  David  Baker 

INSTITUTION:  University  of  Washington 

GRANT  TITLE:  Towards  An  Alternative  for  Antibodies:  Construction  and 
Characterization  of  A  Large  Coinbinatorial  Library  of  Diverse  Binding 
Proteins 

AWARD  PERIOD:  1  April  1998  -  31  March  2001 

OBJECTIVE:  The  overall  goal  of  this  project  is  to  create  a  large  and 
very  diverse  population  of  binding  proteins  from  which  individual 
members  can  be  selected  on  the  basis  of  target  specificity  and  be  used 
as  a  substitute  for  antibodies  in  biosensor  and  other  applications.  To 
accomplish  thiS/  we  have  synthesized  a  collection  of  combinatorial 
libraries  in  which  the  binding  surfaces  of  several  small,  stable  parent 
proteins  are  randomized.  Because  of  the  types  of  parent  molecules 
chosen  and  because  of  the  way  the  libraries  are  designed,  the  variants 
are  expected  to  have  more  physical  stability  than  antibody  molecules 
and  to  be  able  to  function  in  more  severe  types  of  environments. 

APPROACH:  Phage  display  technology  is  used  to  construct  each  library. 
The  parent  molecules  selected  for  randomization  are  calmodulin,  the  c- 
src  SH3  domain,  the  Ick  SH2  domain,  and  the  immunoglobulin-binding 
domain  from  bacterial  protein  L,  These  proteins  each  have  very 
different  binding  surface  shapes  and  their  natural  targets  cover  a  wide 
size  range  and  include  broad  sections  of  large  proteins,  peptides, 
exposed  loops  of  proteins  and  individual  amino  acid  side  chains. 

Library  designs  are  customized  for  each  protein  and  are  based  on  the 
most  current  structural  information  to  try  to  maximize  the  chance  of 
producing  variants  with  new  target  specificities  relative  to  the  parent 
molecule.  In  each  case,  residues  responsible  for  maintaining  the 
structural  integrity  of  the  protein  fold  were  not  changed.  In  the 
completed  libraries,  variant  proteins  are  displayed  on  the  surfaces  of 
phagemid  particles  where  they  are  folded  and  available  for  binding 
interactions.  Individual  variants  are  selected  by  multiple  rounds  of 
biopanning  populations  of  library  phagemids  against  immobilized  target 
molecules.  A  single  round  of  biopanning  consists  of  bulk  phase 
absorption  of  library  phagemids  to  a  test  target  followed  by  removal  of 
weakly  or  non-specif ically  bound  phagemids  by  washing,  and  then  the 
elution,  amplification  and  finally  re-absorption  of  binders  to  the  same 
test  ligand  to  start  a  second  round  of  biopanning.  Variant  proteins 
carried  by  phagemids  obtained  in  such  screens  can  be  easily  purified, 
cloned,  expressed  and  adapted  for  use  in  specific  applications. 

ACCOMPLISHMENTS: 

(1)  Library  pool  completed. 

Seventeen  different  libraries  have  been  synthesized.  Four  libraries 
are  variations  of  the  c-src  SH3  domain,  two  are  from  the  Ick  SH2 
domain,  nine  are  from  the  protein  L  immunoglobulin-binding  domain  and 
two  are  from  calmodulin.  Complexities  are  between  10^  and  10^  protein 


variants  per  individual  library  and  there  are  a  total  of  approximately 
4x10^  different  variants  in  the  library  pool. 

(2)  Recovery  of  proteins  with  new  binding  specificities  from  the 
library  pool. 

(a)  Variant  proteins  which  bind  to  a  lambda-chain  carrying 
immunoglobulin  target  that  has  no  affinity  for  wild  type  protein  L  have 
been  recovered  from  a  library  screen.  The  first  variant  characterized 
binds  the  new  target  with  a  disassociation  constant  of  4xlO"^M. 

(b)  In  order  to  evaluate  the  library  pool  and  determine  whether 
it  is  a  practical  source  of  receptors,  we  biopanned  library  phagemid 
populations  against  31  different  compounds.  The  collection  of  test 
compounds  was  chosen  randomly  and  included  20  proteins,  4  peptides  and 
7  small  molecules.  None  of  the  ligands  showed  measurable  affinity  for 
any  of  the  library  parent  proteins.  After  four  rounds  of  biopanning, 
we  detected  apparent  binding  proteins  to  22  of  the  31  ligands  tested 
(71%) ,  By  "apparent  binders"  we  the  number  of  phagemids  retained  after 
washing  were  present  at  levels  lO-lOOx  greater  than  the  background 
binding  levels  of  phagemids  to  the  negative  controls  and  comparable  to 
the  highest  levels  of  retention  observed  for  the  positive  control 
targets.  Of  the  22  ligands  able  to  select  apparent  binders  out  of  the 
library  pool,  seventeen  were  proteins,  two  were  peptides  and  three  were 
small  organic  molecules.  The  original  group  of  test  ligands  contained 
similar  ratios  of  proteins,  peptides  and  small  molecules  indicating 
there  is  no  obvious  target  size  binding  preferences  among  the  proteins 
in  the  library  pool. 

CONCLUSIONS:  The  library  pool  appears  to  be  a  good  source  of  general 
binding  proteins  as  it  contained  apparent  binding  proteins  for  more 
than  2/3  of  the  ligands  tested.  This  is  especially  significant  in  that 
the  ligands  were  chosen  randomly  and  as  a  group,  spanned  a  wide  range 
of  sizes,  shapes  and  chemistries. 

SIGNIFICANCE:  Individual  library  variants  will  be  suitable  for  many 
commercial  and  basic  science  applications.  These  include  their  use  as 
receptors  for  biosensors,  as  purification  reagents  for  affinity 
chromatography  and  as  detection  probes  that  can  be  used  to  identify 
specific  proteins  or  individual  post  translational  modifications  within 
populations  of  proteins  dispersed  on  Western  blots  or  within  cells 
processed  for  fluorescence  microscopy. 
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Simplified  proteins:  minimalist  solutions  to  the  ‘protein  folding 
problem’ 

Kevin  W  Plaxco*,  David  S  Riddlet,  Viara  Grantcharoval^  and 
David  Baker^ 


Recent  research  has  suggested  that  stable,  native  proteins 
may  be  encoded  by  simple  sequences  of  fewer  than  the 
full  set  of  20  proteogenic  amino  acids.  Studies  of  the 
ability  of  simple  amino  acid  sequences  to  encode  stable, 
topologically  complex,  native  conformations  and  to  fold  to 
these  conformations  in  a  biologically  relevant  time  frame  have 
provided  insights  into  the  sequence  determinants  of  protein 
structure  and  folding  kinetics.  They  may  also  have  important 
implications  for  protein  design  and  for  theories  of  the  origins 
of  protein  synthesis  itself. 

Addresses 

Department  of  Biochemistry,  Box  357350,  University  of  Washington, 

Seattle,  WA  98195,  USA 

*e-mail:  kwp@elina.bchem,washington.edu 

te-mail:  riddle@u.washington.edu 

te-mail:  grantch(§u.washington.edu 

#e-mail:  baker@ben.bchem.washington.edu 

Current  Opinion  in  Structural  Biology  1998,  8:80-85 

http://biomednet.eom/elecref/0959440X00800080 

©  Current  Biology  Ltd  ISSN  0959'440X 

Abbreviations 
CD  circular  dichroism 

SH3  Sre  homology  3 


Introduction 

Nature  has  solved  the  ‘protein  folding  problem’  countless 
times,  generating  thousands  of  families  of  rapidly  folding, 
stable  proteins  with  unique  native  conformations.  The 
vast  majority  of  these  proteins  are  encoded  by  complex 
patterns  of  the  20  proteogenic  amino  acids  [1-3].  This 
pattern  and  compositional  complexity  has  proven  to  be 
a  significant  hurdle  to  theorists,  experimentalists  and 
engineers  attempting  to  explain  or  reproduce  the  primary 
features  of  proteins.  But  is  the  full  complexity  of  naturally 
occurring  sequences  required  to  encode  their  unique 
native  structures?  And,  if  not,  what  might  ‘simplified’ 
sequences  teach  us  about  how  a  linear  string  of  amino 
acids  encodes  a  complex  three-dimensional  structure  and 
can  rapidly  discriminate  between  this  structure  and  the 
astronomically  large  number  of  conformations  accessible  to 
the  unfolded  state? 

In  this  paper,  we  will  review  recent  experiments  aimed 
at  designing  or  selecting  for  significantly  simplified  amino 
acid  sequences  capable  of  forming  stable,  native  or 
native-like  proteins.  In  discussing  some  of  the  insights 
into  the  sequence  determinants  of  structure  and  folding 
kinetics  gathered  from  these  studies,  we  will  place 


particular  emphasis  on  recent  investigations  of  simplified 
hydrophobic  cores  and  fully  simplified  proteins,  as  well  as 
on  the  ability  of  simplified  proteins  to  rapidly  fold  to  their 
native  structures. 

Simplified  core  packing 

The  tight  packing  observed  in  the  interiors  of  proteins  has 
led  to  the  suggestion  that  the  complementary  shapes  and 
sizes  of  core  residues  define**  a  protein’s  fold  in  a  fashion 
analogous  to  the  manner  in  which  the  shapes  of  individual 
jigsaw  pieces  determine  the  overall  layout  of  the  finished 
puzzle  (discussed  in  [4, 5**, 6-10]).  Experimental  tests  of 
this  hypothesis,  however,  have  repeatedly  demonstrated 
that  dramatic  core  changes  can  be  accommodated  without 
significantly  disrupting  native  structure  [5**,6,1 1,12].  But 
although  these  studies  indicate  that  the  rules  of  core  pack¬ 
ing  are  fairly  flexible,  the  variant  proteins  investigated  all 
maintain  the  high  degree  of  core  complexity  characteristic 
of  naturally  occurring  proteins.  Thus  the  question  remains: 
is  it  possible  for  a  simple  amino  acid  sequence,  presumably 
lacking  highly  complementary  side-chain  interactions,  to 
encode  a  well-packed  hydrophobic  core?  Recent  evidence 
suggests  that  it  is. 

An  earlier  series  of  core  simplification  attempts  (reviewed 
in  [13])  is  illustrative  of  the  approach.  Regan  and  co-work¬ 
ers  [14,15]  have  produced  variants  of  the  RNA-binding 
protein  Rop  with  highly  simplified  hydrophobic  cores.  Rop 
is  a  semi-regular  dimer  of  two-helix  monomers  packed 
to  form  a  four-helix  bundle.  Like  coiled-coil  proteins, 
the  sequence  of  Rop  is  an  array  of  heptad  repeats 
{abcdefgabcdefg)  in  which  positions  a  and  d  contribute  to 
the  hydrophobic  core.  In  the  wild-type  protein,  six  of  16 
a  sites  and  eight  of  16  d  sites  are  alanine  and  leucine 
respectively.  A  variant  of  Rop  with  all-alanine  a  sites 
and  all-leucine  d  sites  is  significantly  more  thermostable 
than  the  wild-type  protein  and  folds  into  a  fully  native 
protein  [14,15].  An  alternating  sequence  of  large  and  small 
hydrophobic  residues  may  be  critical  for  the  formation  of 
native  core  packing:  the  over-packed  all-leucine  variant 
forms  a  very  stable  molten  globule  and  the  under-packed 
all-alanine  variant  remains  unstructured  (Table  1).  This 
suggests  that,  perhaps  because  of  constraints  associated 
with  the  symmetrically  packed  helical  structure  of  Rop, 
a  two-letter  core  alphabet  and  limited  pattern  complexity 
are  required  to  encode  a  native  structure. 

The  simplification  of  the  core  of  Rop  was  based  on,  and 
perhaps  somewhat  limited  by,  the  regular  structure  of  four- 
helix  bundles,  but  studies  of  globular  proteins  indicate 
that  they  may  be  even  more  amenable  to  simplification. 
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Table  1 


Some  simplified  proteins  and  the  sequences  from  which  they  were  derived. 


— - - 

AGy 

kf 

References 

Name 

(kcal/mol) 

(s-’) 

Simplification  scheme 

WT  Rop 

Ala2Leu2-8 

Ala4-8 

L  4-8 

7.7 

7.5 

<0 

30 

0.013 

7.9 

Alternating  leucine-alanine  core 

All  alanine  core 

All  leucine  core 

[14,15,37-] 

[16-1 

WT  T4  lyso2yme 

7  met  T4  lysozyme 

(-5.0)* 

7  of  10  core  positions:  methionine 

10  metT4  lysozyme 

(-7.3)* 

10  of  10  core  positions:  methionine 

VVT  Cro  repressor 

2.0t 

11  of  13  core  positions:  leucine 

[17] 

M-Cro 

0.4 

a1 

a4 

it-  His 4 

11.4 

Simple  helical  homotetramer  (GDLK) 

[22-24] 

15.4 

Plus  helix  cap  and  turns  (GDLKPR) 

10.3 

Plus  metal  binding  site  (GDLKPRH) 

VVT  SrcSH3 

3,7 

57 

40  of  45  non-active  site  residues  (GEIKA) 

[26”] 

FP1 

FP2 

3.0 

93 

1.7 

57 

39  of  45  non-active  site  residues  (GEIKA) 

•Absolute  unfolding  free  energies  are  not  available.  Unfolding  free  energtes  (AAGJ  relative  to  wild-type  are  reported.  -Stability  of  point  mutant  from 
which  simplified  variants  were  derived,  FP,  full  protein;  WT,  wild-type. 

:  i-cype  T4  lysozyme  consists  of  two  domains,  the  larger 
of  which  contains  a  hydrophobic  core  of  approximately  10 
residues.  Matthews  and  co-workers  [16**]  have  recently 
characterized  variants  with  seven  and  10  of  these  residues 
replaced  by  methionine.  Surprisingly,  although  somewhat 
destabilized  (Table  1),  even  the  10-methionine  variant 
folds  to  a  partially  active  and  thus  presumably  very 
native-like  conformation.  Crystallographic  analysis  of  the 
sc'*en-methionine  variant  indicates  that  che  core  is  so  well 
packed  that  the  total  volume  of  the  new  core  methionines 
is  slightly  less  than  would  be  predicted  from  the  density 
of  well-packed  methionine  crystals  [4].  Although  the 
average  backbone  deviation  between  the  wild  type  and 
seven-methionine  variant  is  only  0.2  A,  changes  of  up  to 
i.oA  are  observed  at  several  backbone  positions.  It  is 
via  these  small  conformational  rearrangements  that  the 
simplified  variant  is  able  to  accommodate  such  dramatic 
con- positional  changes  and  generate  this  well-packed  core. 

Studies  of  simplified  core  variants  of  the  phage  434 
protein  suggest  that  the  flexible,  unbranched  side 
chain  of  methionine  is  not  required  to  create  highly 
simplified  hydrophobic  cores  [17].  One  variant,  with  11 
cf  13  core  residues  replaced  with  leucine,  exhibits  the 
folding  cooperativity  and  NMR  dispersion  expected  of  a 
fuliv  folded  protein,  although  the  variant  is  significantly 
^^“'fabilized  (Table  1).  These  examples  of  well-packed 
hydrophobic  cores  constructed  from  extremely  simple 
^mino  acid  sequences  suggest  that  the  packing  of  core 
residues  need  not  mimic  the  precise  spatial  complemen¬ 
tarity  of  the  pieces  of  a  jigsaw  puzzle  in  order  to  encode  a 
Unique,  native  conformation. 


Simplified  proteins 

If  hydrophobic  cores  can  be  built  from  one  or  two  residue 
types  and  very  simple  sequence  patterns,  what  are  the 
minimum  size  amino  acid  alphabet  and  the  simplest 
sequence  patterns  that  can  encode  entire  proteins.^  Several 
important  ‘alanine  minimization’  studies  have  confirmed 
that  much  of  the  sequence  of  naturally  occurring  globular 
proteins  is  redundant  and  can  be  replaced  by  alanine 
provided  a  ‘scaffold’  of  residues  is  maintained  that  encodes 
a  hydrophobic  core  and  defines  important  secondary  struc¬ 
tural  elements  and  tertiary  contacts  [18,19].  Hecht  and 
co-workers  [5*,20,21*]  have  demonstrated  that  four-helix 
bundle  proteins  can  be  generated  from  highly  divergent 
sequences  of  11  of  the  proteogenic  amino  acids,  as  long  as 
the  correct  binary  pattern  of  hydrophobic  and  hydrophilic 
residues  is  maintained.  Although  the  non-native  structures 
of  the  all-alanine  and  all-leucine  Rop  core  variants  suggest 
that  many  of  these  sequences  do  not  encode  fully  native 
proteins,  the  extremely  high  recovery  (60%)  of  soluble, 
protease-resistant  and  potentially  native-like  species  re¬ 
ported  suggests  that,  with  a  well-chosen  reduced  alphabet 
and  the  correct  patterning  of  hydrophobic  and  hydrophilic 
residues,  even  compositionally  simple  sequences  are  likely 
to  be  able  to  encode  folded  proteins. 

DeGrado  and  co-workers  [22-24]  have  been  exploring 
this  issue  through  a  series  of  elegant  studies  directed 
towards  the  de  novo  design  of  four-helix  bundle  proteins. 
Their  earlier  efforts,  aimed  at  simple,  multimeric  proteins 
built  entirely  of  glycine,  glutamate,  leucine  and  lysine 
culminated  in  the  production  of  short  helical  segments 
capable  of  forming  four-helix  bundle  like  tetramers  in 
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solution  [22].  The  addition  of  proline  residues  to  break 
the  helices  and  arginine  residues  to  create  interhelix  loops 
resulted  in  the  creation  of  a  monomeric  four-helix  bundle 
[22,23];  however,  despite  the  relatively  high  stability  of 
the  four-helix  bundle  (Table  1),  it  exhibits  the  rather  poor 
hydrogen  exchange  protection  and  non-unique  core  pack¬ 
ing  [24]  characteristic  of  the  over-packed,  all-leucine  core 
variant  of  Rop.  The  formation  of  truly  native  four-helix 
bundles  required  the  addition  of  several  histidine  residues 
to  form  a  metal-binding  site  [24],  although  comparison 
with  the  Rop  structure  suggests  that  the  addition  of  a 
limited  degree  of  core  complexity  might  also  produce  a 
native  structure. 

Simplified  helical  bundle  proteins  take  advantage  of 
the  simple,  periodic  amino  acid  patterns  of  amphipathic 
helices.  Work  in  our  laboratory,  however,  has  suggested 
that  it  is  also  possible  to  build  complex,  nonregular 
topologies  from  reduced  amino  acid  alphabets.  Using  a 
phage  display  approach  (see  [25]  and  references  therein), 
we  have  selected  for  significantly  simplified  sequences 
that  fold  into  the  structure  of  the  topologically  complex, 
predominantly  |3-sheet  SH3  (Src  homology  3)  domain. 
And  how  few  residue  types  does  it  take  to  make 
this  topologically  complex  structure?  Attempts  to  select 
for  proteins  comprising  a  three-residue  alphabet  (lysine, 
isoleucine  and  glutamate)  were  relatively  unsuccessful  as 
numerous  alanine  and  glycine  residues  were  maintained  in 
the  few  folded  proteins  recovered.  These  two  residues  are 
probably  required  because  of  the  need  for  a  small  nonpolar 
residue  to  pack  with  the  large,  (3-branched  isoleucine  and 
because  of  the  propensity  of  glycine  to  form  tight  turns. 
The  inclusion  of  these  two  residues  into  the  reduced 
alphabet  led  to  the  recovery  of  several  highly  simplified 
sequences  that  adopt  the  SH3  fold.  The  two  simplest 
variants  recovered  to  date  are  68%  and  70%  composed  of 
this  five-residue  set  (89%  and  90%  respectively  of  residues 
outside  the  active  site;  Figure  1)  and  yet  their  stability 
(Table  1),  activity,  NMR  and  circular  dichroism  (CD) 
spectra  [26**]  are  those  of  fully  native  proteins.  Although 
these  variants  are  almost  as  compositionally  simplified  as 
the  helical  proteins  described  above,  they  are  encoded 
by  highly  complex,  nonrepetitive  amino  acid  sequences. 
This  is  consistent  with  the  observation  that  naturally 
occurring  p-sheet  proteins  lack  the  strong  patterning  of 
hydrophobic  and  hydrophilic  residues  apparent  in  their 
a-helical  counterparts  [2].  These  results  indicate  that  a 
five-residue  alphabet  may  be  sufficient  to  encode  all  of  the 
functions  required  to  generate  even  topologically  complex 
proteins. 

The  simplest  proteins 

If  four-helix  bundles  and  globular  p-sheet  proteins  can  be 
encoded  by  simple  amino  acid  sequences,  what  fraction  of 
simple,  random  sequences  might  encode  native  proteins?. 
Recent  work  conducted  in  our  laboratory  and  by  Sauer 
and  co-workers  [27-29]  indicates  that  a  surprisingly  large 
fraction  of  even  the  very  simplest  sequences  fold,  although 


Figure  1 


A  model  of  the  structure  of  a  highly  simplified  SH3  domain,  68% 
of  FP2  is  composed  of  just  five  residue  types  (isoleucine,  lysine, 
aspartate,  glycine  and  alanine),  which  are  shown  as  a  grey  backbone 
ribbon.  The  active  site  and  recognition  peptide  residues  are  rendered 
in  black  with  side  chains,  and.  residues  not  simplified  in  FP2  shown 
as  black  tubes.  90%  of  residues  outside  the  binding  cleft  are 
composed  of  the  reduced  amino  acid  alphabet.  FP2  is  the  most 
simplified  protein  recovered  so  far  from  simple  sequence  SH3 
domain  libraries  but  may  not  represent  the  simplest  possible  SH3 
sequence.  The  observation  of  other  variants  simplified  at  several  of 
the  positions  not  simplified  in  FP2  suggests  that  these  positions  do 
not  necessarily  encode  unique  structural  information.  The  structure, 
stability  and  folding  kinetics  of  FP2  suggest  that  the  full  complexity 
of  naturally  occurring  sequences  is  not  required  to  encode  rapidly 
folding,  topologically  complex,  native  proteins  (26**].  FP,  full  protein. 


perhaps  not  to  fully  native  proteins,  to  at  least  protein-like 
structures.  Sauer  and  co-workers  [27-29]  have  investigated 
simple,  80-residue  proteins  made  up  almost  entirely  of 
glutamine,  leucine  and  arginine.  Approximately  1%  of 
these  randomly  generated  sequences  encode  proteins  that 
are  sufficiently  protease  resistant  to  be  recovered  from 
the  Escherichia  coli  expression  library  used  in  the  study, 
of  which  approximately  two-thirds  display  cooperative 
thermal  denaturation  transitions  and  CD  signatures  charac¬ 
teristic  of  folded,  helical  proteins.  Many  of  these  proteins 
also  form  apparently  well-defined  multi  meric  complexes 
similar  to  those  formed  by  naturally  occurring  proteins.  ^ 
Hydrogen-exchange  experiments,  however,  indicate  that, 
although  the  recovered  proteins  are  native-like,  they  ex-  ; 
hibit  the  reduced  levels  of  hydrogen-exchange  protection ; 
characteristic  of  the  molten  globule  state.  The  fraction  of> 
such  sequences  that  encode  fully  native  proteins  remains  j 
to  be  determined.  1 
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Our  laboratory  has  studied  the  properties  of  a  chemically 
synthesized  ‘sequence  space  soup’  of  heteropolymers 
composed  of  leucine  (40%),  ornithine  (30%)  and  gluta¬ 
mate  (30%).  Populations  of  random  sequences  with  this 
bulk  composition  and  of  homogeneous  molecular  weight 
were  generated.  They  were  readily  soluble  in  aqueous 
buffer  and  exhibited  considerable  helical  structure  that 
w  disrupted  by  guanidine  hydrochloride  denaturation 
(i  :gure  2).  Gel  filtration  and  small  angle  X-ray  scattering 
experiments  indicate  that  the  majority  of  these  random 
peptides  form  small  oligomers  (V  Grantcharova,  D  Baker, 
unpublished  data).  Overall,  the  properties  of  the  popu¬ 
lation  were  similar  to  that  of  those  individual  sequences 
characterized  by  Sauer  and  co-workers  [27-29],  suggesting 
chat  helicity  and  the  tendency  to  form  small  oligomers 
are  properties  frequently  encoded  by  this  class  of  simple 
sr  uences. 


Sequence  space  soups  comprising  23*,  46-  or  70'residue  random 
polymers  of  leucine,  glutamate  and  ornithine  (solid  lines  from  top 
down)  fold  to  form  a-helical  structures  as  indicated  by  these  CD 
spectra.  Because  these  amino  acids  have  the  same  molecular  weight, 
all  of  the  peptides  in  each  population  are  of  equal  masses.  The 
helical  structure  of  the  70-residue  peptides  is  lost  in  the  presence 
of  guanidine  hydrochloride  (dotted  line).  Despite  the  constant 
molecular  mass,  the  70-mer  population  exhibits  hydrodynamic 
behr.  ior  consistent  with  the  formation  of  small  oligomers  (data  not 
shu  .'.n). 


The  folding  kinetics  of  simplified  sequences 

in  1936,  Mirsky  and  Pauling  [30]  correctly  surmised  that 
fhe  rcnaturation  of  a  denatured  protein  requires  that 
4  single  native  structure  be  distinguished  from  among 
approximately  10^^  possible  unfolded  conformations.  In 
Anfinsen  [31]  demonstrated  that  all  of  the  infor- 
required  to  rapidly  perform  this  discrimination 
encoded  by  a  protein’s  primary  sequence.  More  than 
thirty-five  years  later,  investigations  of  the  mechanisms  by 
'^’hich  this  ‘kinetic  half  of  the  protein  folding  problem’  is 
tcsolved  remain  an  area  of  active  research.  Recent  studies 
the  folding  kinetics  of  highly  simplified  proteins  have 


attempted  to  define  the  sequence  determinants  of  this 
process, 

A  fundamental  aspect  of  the  folding  of  proteins  is  that 
an  extended  and  highly  disordered  polymer  chain  must 
collapse  to  form  a  compact,  globular  protein.  It  has  been 
postulated  that  a  largely  random  collapse  process,  driven 
by  the  exclusion  of  hydrophobic  residues  from  the  solvent, 
is  followed  by  the  reorganization  of  this  compact  state  to 
form  the  native  conformation  (reviewed  in  [32,33,34*]). 

If  this  theory  is  correct,  hydrophobic  core  simplification 
might  be  expected  to  disrupt  collapse  efficiency  and 
reduce  otherwise  optimized  folding  rates.  Moreover,  if  the 
diffusive  rearrangements  of  a  collapsed  intermediate  is  a 
rate-limiting  folding  step,  then  core  sequence  redundancy 
might  slow  folding  as  degenerate  conformations  with  near¬ 
native  packing  trap  the  rearrangement  process  [35*, 36]. 

Recent  studies  of  the  folding  kinetics  of  simplified  core 
Rop  variants  do  not  support  a  role  for  precise  chain  packing 
in  directing  the  folding  process.  All  of  the  simplified  core 
variants  of  Rop  fold  more  rapidly  than  does  the  wild-type 
protein  (Table  1)  —  the  alternating  alanine  and  leucine 
core  variant  an  amazing  610  times  more  rapidly  [37**]. 
Although  this  acceleration  may  be  due  to  the  elimination 
of  hydrophilic  core  residues  [37**, 38],  that  the  refolding 
rates  of  the  simplified  variants  are  not  decelerated 
indicates  chat  highly  unique,  jigsaw-like  core  packing 
might  not  be  a  fundamental  requirement  of  rapidly  folding 
proteins.  A  small  protein  like  Rop,  however,  might  not 
accurately  model  the  folding  processes  of  larger  proteins, 
which  are  thought  to  fold  in  a  two-step  process  chat  is 
much  more  dependent  on  the  rate  of  core  reorganization 
in  collapsed  intermediate  species  [32,33,34*].  We  thus 
eagerly  await  the  results  of  studies  of  the  refolding  of 
simplified  core  variants  of  larger  proteins. 

If  specific  core  packing  is  not  required  to  generate 
rapidly  folding  proteins,  then  what  are  the  determinants 
of  protein  folding  kinetics?  Recent  investigations  of  the 
folding  race  of  simplified  SH3  variants  suggest  instead 
chat  folding  races  may  be  largely  determined  by  the 
interactions  chat  stabilize  the  native  state  and  might  not 
depend  on  specific,  conserved  folding  pathways  [39]. 
Despite  dramatic  changes  in  amino  acid  sequence  and 
overall  amino  acid  composition,  the  two  most  highly 
simplified  SH3  variants  characterized  to  dace  refold  as 
fast  as  or  faster  chan  the  wild-type  SH3  sequence  from 
which  they  were  derived  (Table  1)  [26**].  Moreover,  the 
simplified  variants  and  the  wild-type  sequence  maintain 
qualitatively  very  similar  folding  kinetics:  all  fold  in 
simple,  two-state  processes  via  relatively  similar  transition 
state  conformations.  That  these  variants  fold  rapidly 
despite  the  absence  of  any  obvious  selection  for  this 
characteristic  suggests  chat  rapid  folding  is  more  a  feature 
of  a  stable,  cooperative  native  fold,  which  was  a  selection 
criterion,  chan  of  specific,  conserved  folding  pathways. 
The  rapid  folding  of  these  simple  variants  also  suggests 
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that  the  compositional  complexity  found  in  naturally 
occurring  proteins  may  not  be  required  to  encode  their 
rapid  folding. 


Simple  proteins  and  the  origins  of  protein 
synthesis 

Protein  synthesis  is  a  complex  process  involving  nu¬ 
merous  tRNAs,  acyl-tRNA  synthetases  and  many  other 
critical  components.  How  might  such  a  complex  system 
have  arisen  spontaneously?  Despite  a  long  history  of 
investigation  (see,  for  example,  [40-44])  no  answer 
to  this  question  appears  forthcoming.  The  ability  of 
reduced  alphabet  proteins  to  fold  rapidly  to  stable,  native 
structures  suggests,  however,  that  the  full  complexity 
seen  in  contemporary  protein  synthesis  might  not  have 
been  required  to  generate  primordial  proteins  capable  of 
providing  a  selective  advantage  [45]. 


Conclusions 

Although  significant  advances  have  been  made  towards 
the  goal  of  constructing  highly  compositionally  simplified 
proteins,  it  is  notable  that  native  proteins  composed  of 
less  than  seven  amino  acid  types  have  not  yet  been 
demonstrated.  This  is  no  more  simple  than  the  simplest 
structured,  naturally  occurring  proteins.  (The  single-helix, 
48-residue  yellowtail  flounder  antifreeze  protein,  to  our 
knowledge,  holds  the  record  at  just  seven  residue  types 
[46,47*].)  Although  this  may  suggest  that  more  highly 
simplified  native  proteins  cannot  exist,  the  preceding 
studies  have  demonstrated  that  even  smaller  subsets  of 
the  proteogcnic  amino  acids  can  encode  each  of  the 
individual  properties  characteristic  of  naturally  occurring 
proteins.  Thus  we  are  optimistic  that  future  attempts 
to  generate  more  significantly  simplified  proteins  will 
prove  fruitful.  The  demonstration  of  such  proteins  should 
facilitate  de  novo  protein  design  efforts  by  indicating  the 
minimal  sequence  elements  required  to  encode  native 
folds.  These  studies  should  also  serve  to  filter  the  ‘noise’ 
out  of  naturally  occurring  protein  sequences,  and  project 
into  sharp  contrast  the  major  determinants  of  structure  and 
folding. 


Note  added  in  proof 

Stroud  and  co-workers  [48**]  have  recently  published  the 
synthesis  and  structure  of  a  de  novo  designed  108-residue 
protein  composed  of  just  seven  residue  types  that  folds 
into  a  fully  native  four-helix  bundle. 
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Important  role  of  hydrogen  bonds  in  the 
structurally  polarized  transition  state  for 
folding  of  the  src  SH3  domain 

Viara  P.  Grantcharova^'^,  David  S.  Riddle’'^  Jed  V.  Santiago'  and  David  Baker' 


Experimental  and  theoretical  studies  on  the  folding  of  small  proteins  such  as 

and  the  P22  Arc  repressor  suggest  that  the  folding  transition  state  is  an  expanded  version  of  the  native  state 
with  most  inSons  partially  formed.  Here  we  report  that  this  picture  does  not  hoM  generally:  a  hydrogen 
bond  network  involving  two  p-turns  and  an  adjacent  hydrophobic  cluster  appear  to  be  formed  in  the  folding 
transition  state  of  the  src  SH3  domain,  while  the  remainder  of  the  polypeptide  chain  is  largely  “"Structured. 
Comparison  with  data  on  other  small  proteins  suggests  that  this  structural  Po'anzation  is  a  consequence  of  the 
topology  of  the  SH3  domain  fold.  The  non-uniform  distribution  of  structure  in  the  folding  transition  state 
provides  a  challenging  test  for  computational  models  of  the  folding  process. 
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A  detailed  understanding  of  how  anaino  acid  sequences  deter¬ 
mine  protein  three-dimensional  structures  requires  the  identifi¬ 
cation  of  residues  and  interactions  that  play  critical  roles  in  the 
folding  process.  In  previous  experiments  directed  at  establishing 
the  minimal  sequence  requirements  for  the  folding  of  a  small 
protein,  the  SH3  domain,  a  combinatorial  library  selection  stra^ 
egy  was  used  to  obtain  functional  SH3  domains  composed  pri¬ 
marily  of  I,  K,  E,  A  and  G  outside  of  the  binding  siteh  The  folding 
rates  of  these  simplified  SH3  variants  were  found  to  be  very  sim¬ 
ilar  to  that  of  the  wild  type  protein,  suggesting  that  t^e  residues 
critical  to  folding  kinetics  must  have  been  conserved  in  the  selec¬ 
tion.  Here  we  investigate  the  roles  of  the  conserved  residues 
(Table  1)  in  the  folding  reaction  by  studying  the  consequences  of 
alanine  substitutions  at  these  positions  on  the  thermodynamics 
and  kinetics  of  folding.  Initially  we  focused  on  a  non-local 
hydrogen  bonding  network  involving  Glu  30,  Ser  47  and  Thr  50 
(Fig.  1)  due  to  the  strong  selection  pressure  observed  at  these 
positions  (Glu  30  is  absolutely  conserved  and  Ser  47  is  frequent¬ 
ly  recovered  even  though  it  was  not  allowed  in  the  mutagenesis 
strategy).  The  finding  that  these  residues  are  important  for  the 
kinetics  of  folding  prompted  us  to  extend  the  analysis  to  residues 
throughout  the  src  SH3  domain  in  order  to  obtain  a  more  com¬ 
plete  picture  of  the  folding  transition  state. 

Contributions  to  src  SH3  stability 

The  structure  of  the  src  SH3  domain  consists  of  two  p-sheets 
orthoganally  packed  around  a  hydrophobic  core-’^  Within  the 
sheets,  strands  are  joined  by  the  RT,  n-src  and  distal  loops,  while 
the  crossovers  between  the  two  sheets  occur  at  a  diverging  type  II 
p-turn  and  a  short  3io-helix  (Fig  la).  The  residues  conserved  in 
the  combinatorial  library  selection  were  located  in  the  distal  loop 
and  diverging  turn  (Gly  29,  Gly  51),  the  hydrogen  bond  network 
between  them  (Glu  30,  Ser  47  and  Thr  50;  Fig.  lb),  and  the 
hydrophobic  core  (Phe  10,  Leu  24,  He  34,  Ala  45,  He  56). 
Substitutions  in  the  hydrophobic  core  were  most  destabilizing, 
with  mutations  in  the  center  of  the  core  (Ala  45,  Leu  32,  He  56) 
having  larger  effects  (AAG^  from  1.8-3. 6  kcal  mol“0  than  muta¬ 


tions  at  the  periphery  (Leu  24,  Phe  10,  Val  61,  He  34;  AAGu  from 
0.6-1. 8  kcal  mob*).  Mutations  in  the  hydrogen  bonding  network 
also  decreased  AG^  significantly  (AAG^  from  1.8-2. 5  kcal 
mob*),  indicating  the  importance  of  these  interactions  in  stabi¬ 
lizing  the  native  state.  Each  of  the  mutated  residues  makes  both 
local  and  non-local  hydrogen  bonds  and  thus  the  total  energetic 
cost  of  the  mutations  is  consistent  with  previous  estimates  of  1-2 
kcal  mob*  per  hydrogen  bond^.  Partially  exposed  aromatic 
residues  lining  the  peptide  binding  pocket  (Trp  42,  Tyr  16,  Tyr 
55)  can  be  viewed  as  extensions  of  the  hydrophobic  core  and  ala¬ 
nine  substitutions  at  these  positions  decreased  stability.  On  the 
other  hand,  mutation  of  the  completely  solvent  exposed  Tyr  60 
to  alanine  increased  AGu)  probably  by  destabilizing  non-native 
conformations  in  which  this  residue  is  partially  buried.  The 
remaining  mutations  probe  the  integrity  of  the  various  loops. 
Glycine  to  alanine  substitutions  in  the  distal  loop  (G51)  and  the 
diverging  turn  (G29)  decreased  stability,  while  disrupting  inter¬ 
actions  in  the  n-src  (G40,  N36)  and  RT  loops  (D15,  S18)  either 
did  not  affect  or  slightly  increased  stability.  The  interpretation 
that  the  first  two  structural  elements  have  more  rigid  structural 
requirements  than  the  second  two  is  consistent  with  our  previ¬ 
ous  finding  that  amide  protons  in  the  distal  loop  hairpin  and 
diverging  turn  are  more  protected  from  exchange  than  amide 
protons  in  the  n-src  and  RT  loops  L 

Kinetics  of  folding  and  unfolding 

The  kinetics  of  folding  of  the  mutant  proteins  were  characterized 
using  stopped-flow  fluorescence.  The  mutations  were  found  to 
fall  into  three  categories  depending  on  whether  the  folding  rate 
(kf.  Fig.  2a),  the  unfolding  rate  (k^,  Fig.  2b),  or  both  were  affect¬ 
ed  (Table  2).  It  is  convenient  to  use  simple  transition  state  theory 
to  interpret  the  kinetic  data;  computational  models  of  folding  for 
which  the  folding  rate  and  the  free  energy  difference  between  the 
unfolded  state  and  the  transition  state  (AGu.^)  can  be 
determined  independently^’%  suggest  that  the  approximation 
rate  =  D  exp  (-AGy.  ./RT)  provides  an  excellent  estimate  of  the 
folding  rate  (D  is  the  frequency  of  transitions  between  related 
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Fig.  1  a.  Ribbon  diagram  of  the  src  SH3  domain  crystal  structure^  with  the  loops  and  turns  labeled,  b,  Non-local  hydrogen  bond  network  connecting 
the  distal  loop  {Ser  47  and  Thr  50)  and  the  diverging  turn  (GIu  30).  Atomic  coordinates  were  taken  from  the  crystal  structure  of  src  SH3  domain  with¬ 
in  the  context  of  the  intact  tyrosine  kinase  L  A  coordinating  water  molecule  may  stabilize  the  interaction.  Hydrogen  bonding  residues  are  shown  in 
red;  the  two  other  residues  in  this  region  with  high  O?  values  (Gly  51  and  Ala  45)  are  shown  in  magenta.  Both  images  were  created  with 
MidasPlus^®'”. 


conformations).  A  simple  but  useful  interpretation  of  the 
kinetic  data  based  on  this  expression  is  that  mutations  which 
decrease  kf,  but  do  not  alter  k^  disrupt  interactions  stabilizing 
both  the  transition  and  the  native  state  while  mutations  which 
increase  k^,  but  do  not  change  kf  disrupt  interactions  formed 
after  the  transition  state®.  A  mutation  will  simultaneously 
decrease  k(  and  increase  k^  in  this  model  if  some  but  not  all  of 
the  interactions  made  by  the  residue  in  the  native  state  are  also 
made  in  the  transition  state.  The  structure  of  the  folding  tran¬ 
sition  state  for  the  src  SH3  domain  deduced  from  the  kinetic 
measurements  using  this  model  is  presented  below. 

Distal  loop  hairpin.  The  distal  loop  hairpin  consists  of  two 
(3-strands  connected  by  a  tight  (3-turn  (Fig.lti).  Several  muta¬ 
tions  probe  the  integrity  of  the  hairpin  and  reveal  that  it  is  rel¬ 
atively  well  structured  in  the  transition  state.  Gly  51,  located  in 
the  p-turn,  has  a  positive  (j)  angle  which  is  disfavored  for  all 
amino  acids  except  glycine;  therefore,  a  substitution  with  ala¬ 
nine  is  likely  to  disrupt  formation  of  the  turn.  The  G51A 
mutation  slowed  the  folding  rate  suggesting  that  the  two 
strands  joined  by  this  turn  are  brought  together  at  the  transi¬ 
tion  state.  Ser  47  and  Thr  50  also  contribute  to  the  structure  of 
the  turn  by  forming  local  hydrogen  bonds  between  their  side 
chain  hydroxyl  oxygens  and  adjacent  backbone  amide  protons 
(Fig.lfci).  Mutation  of  these  residues  significantly  slowed  the 
folding  rate  confirming  the  near-native  structure  of  this  part 
of  the  molecule  in  the  transition  state.  Leu  44  and  Tyr  55  inter¬ 
act  with  each  other  near  the  base  of  the  hairpin  on  the  more 
solvent  exposed  side;  alanine  substitutions  at  these  positions 
slowed  the  folding  rate  and  increased  the  unfolding  rate,  sug¬ 
gesting  that  the  base  of  the  hairpin  is  partially  structured  in  the 
transition  state. 

Diverging  type  II  p-turn.  The  diverging  type  II  p-turn  join¬ 
ing  the  RT  and  the  n-src  loops  is  stabilized  by  a  hydrophobic 
interaction  between  two  residues  flanking  the  turn  and  a  local 
hydrogen  bond  between  the  carboxylate  of  Glu  30  and  a  back¬ 
bone  amide  proton  (Fig.l).  A  local  structure  prediction  pro¬ 
gram  based  on  a  library  of  recurrent  sequence-structure 
motifs  identified  this  region  as  the  most  likely  portion  of  the 
protein  to  adopt  structure  in  isolation,  and  a  seven  residue 
peptide  corresponding  to  this  turn  has  been  found  by  NMR  to 
be  partially  structured  in  solution*^.  Mutation  of  Glu  30  and 


Gly  29  to  alanine  affected  both  the  folding  and  the  unfolding 
rate  suggesting  that  these  residues  have  not  fully  formed  all  of 
their  contacts  in  the  transition  state.  However,  the  interpreta¬ 
tion  is  somewhat  complicated  by  the  fact  that  these  mutations 
are  likely  to  disrupt  residual  structure  in  the  denatured  state. 

Hydrophobic  core.  The  hydrophobic  residues  we  have  charac¬ 
terized  fall  roughly  into  two  classes.  The  first  class  consists  of 
residues  in  a  hydrophobic  cluster  formed  by  the  base  of  the  distal 
loop  hairpin  and  the  strand  following  the  diverging  turn. 
Mutations  in  all  of  these  residues  (Ala  45,  He  34,  Leu  32,  He  56) 
significantly  slowed  the  folding  rate.  The  second  class  consists  of 
residues  outside  of  this  hydrophobic  cluster  that  are,  for  the 
most,  partially  solvent  exposed.  Mutations  in  these  residues  (Phe 
10,  Leu  24,  Tyr  16,  Val  61)  had  relatively  small  affects  on  the  fold¬ 
ing  rate.  Taken  together,  these  results  suggest  that  the  hydropho¬ 
bic  interactions  between  the  base  of  the  distal  hairpin  and  the 
strand  following  the  diverging  turn  are  at  least  partially  formed 
in  the  folding  transition  state.  The  I34A  mutation  slows  both  the 
folding  and  the  unfolding  rate  suggesting  that  it  destabilizes  the 
transition  state  more  than  the  native  state;  the  loss  of  interac¬ 
tions  may  be  partially  compensated  by  structural  rearrange¬ 
ments  or  a  relief  of  strain  in  the  native  state. 

Hydrogen-bond  network.  Mutations  in  the  hydrogen  bond 
network  residues  (S47A,  T50A  and  E30A)  significantly  reduce 
the  folding  rate.  To  our  knowledge,  this  is  the  first  example  of 
the  formation  of  a  hydrogen  bond  cluster  in  a  folding  transi¬ 
tion  state.  As  all  three  residues  are  involved  in  both  local  and 
non-local  hydrogen  bonds  in  the  native  state  it  is  hard  to  dis¬ 
tinguish  conclusively  which  interactions  are  important  for  sta¬ 
bilizing  the  transition  state.  However,  the  large  effect  of 
truncations  of  hydrophobic  residues  which  pack  between  the 
distal  loop  hairpin  and  the  diverging  turn  (L32A  and  A45G) 
on  kf  suggests  that  the  distal  loop  hairpin  and  diverging  turn 
are  not  only  well  structured,  but  also  closely  opposed  at  the 
transition  state.  Therefore,  it  is  likely  that  Glu  30  and  Ser  47 
are  positioned  in  the  proper  geometry  for  the  formation  of  a 
tertiary  hydrogen  bond. 

Unstructured  regions.  The  remainder  of  the  src  SH3  domain 
appears  to  be  disordered  in  the  transition  state.  Mutations  in 
the  RT  loop  (D15A,  SISA),  the  n-src  loop  (N36A,  G40A)  and 
the  cluster  of  surface  aromatics  either  had  no  effect  on  kinetics 
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Fig.  2  Dependence  of  the  rate  of  folding  and  unfolding 
on  the  denaturant  concentration  for  a,  mutants  which 
lower  kf  and  b.  mutants  which  increase  k^.  The  data  for 
the  wild  type  protein  (black)  is  shown  m  both  pane  s  for 
comparison.  For  several  of  the  mutants,  the  dependence 
of  the  folding  rate  on  the  guanidine  concentration  is 
greater  than  that  of  the  wild  type  protein;  these  muta¬ 
tions  may  cause  some  expansion  of  the  denatured  state. 
The  color  scheme  for  the  mutants  is  as  follows:  (a)  A45G, 
red-  S47A  blue;  G51A,  green:  (b)  Y16A,  blue:  F101,  green; 
V61A,  magenta;  D15A,  red.  The  solid  lines  represent  the 
fits  to  the  experimental  data. 


a 


or  exclusively  increased  the  unfolding  rate,  indicat¬ 
ing  that  these  regions  do  not  play  a  role  in  guiding 
the  protein  towards  its  folded  conformation.  These 
structural  elements  are  located  on  the  opposite 
side  of  the  molecule  from  the  distal  loop  hairpin 
and  the  diverging  turn  and  constitute  the  peptide 
binding  site. 
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Fig.  3  a.  Structures  of  the  src  SH3  domain,  b,  the  wt  a-spectrin  SH3 
domain  and  c,  the  distal  loop  permutant,  d.  02  and  e,  the  Arc  repressor 
dimer  colored  by  (tp  value  on  a  continuous  scale  from  red  (1)  to  blue  (0). 
Experimentally  determined  values  for  CI2  were  obtained  from  the  origi¬ 
nal  literatures^  and  include  only  alanine  substitutions.  Op  values  for  Arc 
repressor  were  calculated  using  data  in  ref.  23  and  25.  Two  residues  that 
probably  make  extensive  interactions  in  the  src  SH3  folding  transition 
state  are  not  colored  in  (a)  because  of  complications  In  data  interpreta¬ 
tion:  lie  34  appears  to  make  stronger  interactions  in  the  transition  state 
than  in  the  native  state  (both  the  folding  and  unfolding  rates  are 
slowed),  and  Leu  32,  which  has  a  substantially  decreased  folding  rate,  is 
so  destablized  that  an  accurate  4>f  value  could  not  be  obtained.  These 
residues  are  in  the  strand  following  the  diverging  turn.  All  images  were 
created  with  MidasPlus^^-^^. 


constraints  observed  in  previous  phage  selection  experiments’. 
Two  thirds  of  the  conserved  residues  exhibited  high  Op  values. 
One  exception  was  Leu  24  (Op  =  0.2),  however,  mutation  of  this 
residue  to  alanine  was  found  to  substantially  reduce  the  protein  s 
affinity  for  the  peptide  substrate  used  in  the  selection  (data  not 
shown).  When  many  residues  are  changed  simultaneously,  those 
important  for  kinetics  could  be  conserved  because  they  play 
important  roles  in  specifying  protein  structure.  A  similar  conclu¬ 
sion  was  reached  in  a  lattice  simulation  study’ 

It  was  suggested  previously,  based  on  a  comparison  of  Op 
values  for  CI-2  and  phylogenetic  sequence  variation  within  the 
family  of  small  protease  inhibitors  homologous  to  CI-2’^,  that 
residues  involved  in  the  rate-limiting  step  of  folding  are  con¬ 
served  in  evolution.  However,  for  proteins  such  as  CI-2  in 
which  hydrophobic  core  residues  have  the  highest  Op  values,  it 
is  difficult  to  disentangle  selection  for  stability  from  selection 
for  kinetics  as  core  residues  generally  are  the  most  critical  for 
stability.  The  SH3  domain  is  particularly  well  suited  for 
addressing  the  relationship  between  evolutionary  conserva¬ 
tion  and  Op  values  because  several  residues  which  make 
important  interactions  in  the  transition  state  lie  outside  of  the 
hydrophobic  core.  We  find  no  relationship  between  phyloge¬ 
netic  variation  and  Op  values  for  the  SH3  mutants  described  in 
this  paper  (data  not  shown).  In  fact,  the  opposite  was  observed 
both  with  and  without  exclusion  of  the  binding  residues: 
residues  with  high  Op  values  were  somewhat  less  conserved 
than  residues  with  low  Op  values.  The  considerably  greater 
opportunity  for  compensatory  mutations  in  evolution  (the 
replacement  of  the  hydrogen  bonding  Glu  30,  Ser  47  pair  in 
the  src  SH3  domain  by  two  hydrophobic  residues  in  the  PLC  y 
SH3  domain,  for  example)  compared  to  the  phage  selection 
may  account  for  the  differences  in  the  correlation  between  Op 
values  and  sequence  conservation  in  evolution  and  in  the 
phage  selection. 

Topology  and  transition  state  structure 

A  previous  study  on  the  a-spectrin  SH3  domain  assessed  the 
importance  of  topology  in  determining  the  structure  of  the 
folding  transition  state’L  Circular  permutants  had  very  similar 
native  structures,  but  displayed  substantially  different  distrib¬ 
utions  of  Op  values  (Fig.  3fe,c).  The  distal  loop  permutant  was 
particularly  affected,  consistent  with  our  results  on  the 
involvement  of  this  structural  element  in  the  transition  state. 
Changes  in  chain  connectivity  thus  alter  transition  state  struc¬ 
ture.  Our  data  on  the  src  SH3  domain  provides  the  first  oppor¬ 
tunity  to  examine  the  effect  of  sequence  divergence  on  the 
detailed  structure  of  the  transition  state  (the  src  and  a-spec¬ 
trin  SH3  domains  are  only  34%  identical  in  sequence).  The 
placement  of  the  transition  state  along  the  reaction  coordinate 
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Table  1  Positions  conserved 


Position 

%  Burial 

Phe  10 

76 

lie  (0/70) 

Leu  24 

69 

lie  (0/70) 

Gly  29 

22 

Lys  (0/25) 

Glu  30 

53 

Lys (0/50) 

Ala  45 

99 

He  (0/25) 

Ser  47 

79 

He  (0/25) 

Thr  50 

21 

Ala  (0/25) 

Gly  51 

54 

Lys  (0/25) 

lie  56 

99 

He  (67/70) 

in  the  combinatorial  mutagenesis  selection 
Amino  acids'  (%  observed/%  allowed^ 


Val  (67/10) 
Val  (0/10) 
Glu  (0/25) 
Glu  (100/50) 
Val  (6/25) 
Val  (0/25) 
Gly  (0/25) 
Glu  (0/25) 
Val  (33/10) 


Leu  (0/10) 
Leu  (100/10) 
Arg  (0/25) 

Ala  (94/25) 
Ala  (22/25) 
Ser  (50/25) 
Arg  (0/25) 
Leu  (0/10) 


Phe  (33/10) 
Phe(0/10) 
Gly  (100/25) 

Thr  (0/25) 
Thr  (6/25) 
Thr  (50/25) 
Gly  (100/25) 
Phe  (0/10) 


Asn  (1 6/0) 


Ser  (56/0) 


lit;:  ^^sfnuX"n7he""ren" 

recovery  expected  in  the  absence  of  selective  pressure  given  the  design  of  the  Itbra  y. _ _ _  _ _ 


is  very  similar  for  the  fyn‘^  src^  and  spectrin^’  SH3  domains 
(the  mf/m  ratios  for  the  three  proteins  are  within  experinaental 
error),  suggesting  that  their  transition  states  may  have  similar 
structures.  Comparison  of  our  results  with  the  effects  of  muta¬ 
tions  in  the  a-spectrin  SH3  domain*^-^®  suggests  that  this  is 
indeed  the  case.  In  earlier  studies  of  point  mutants  in  the  spec¬ 
trin  SH3  domain,  two  mutants  which  reported  on  the  proximi¬ 
ty  of  the  distal  loop  to  the  rest  of  the  protein  had  the  highest  Op 
values,  and  in  the  accompanying  paper  by  Serrano  and  cowork- 
ers-h  an  Asp  to  Gly  substitution  in  the  distal  loop  p-turn  is 
shown  to  have  a  Op  value  of  1.  The  similarity  between  the  src 
and  a-spectrin  SH3  domain  folding  transition  states  (compare 
Fig.  3a  and  3b}  is  remarkable  given  their  significant  divergence 
in  sequence  and  suggests  that  the  exact  identities  of  the  amino 
acids  are  not  important  for  the  structure  of  the  transition  state. 
In  fact,  the  critical  hydrogen  bonding  residue  Ser  47  in  the  dis¬ 
tal  loop  of  src  SH3  is  replaced  in  a-spectrin  by  a  valine  which 
contributes  to  the  hydrophobic  core.  Taken  together,  the  obser¬ 
vations  that  the  structure  of  the  transition  state  is  (i)  altered  by 
changes  in  topology  (circular  permutants),  but  (ii)  largely 
invariant  to  the  large  number  of  substitutions  between  the  SH3 
domains  strongly  support  the  idea  that  topology  is  a  dominant 
determinant  of  the  folding  mechanism  of  this  family  of  pro¬ 
teins. 

Previous  work  on  CI-2^^  and  the  P22  Arc  repressor^^  suggest¬ 
ed  that  the  transition  states  of  these  proteins  represent  expand¬ 
ed  forms  of  the  native  state  with  most  interactions  partially 
formed.  The  picture  for  src  SH3  domain  is  quite  different:  its 
transition  state  appears  to  be  quite  polarized,  with  one  portion 
of  the  molecule  much  more  highly  ordered  than  the  rest.  For 
both  CI-2  and  the  Arc  repressor,  the  extent  to  which  a  residue 
contributes  to  the  stability  of  the  transition  state  is  roughly 
proportional  to  its  contribution  to  native  state  stability,  as  evi¬ 
denced  by  the  linear  relationship  between  AAGu  and  AAGu^ 
observed  in  Bronsted  plots^^--^.  In  contrast,  such  a  plot  for  the 
src  SH3  mutants  (data  not  shown)  is  more  similar  to  that  of 
barnase^^-^"^,  a  larger  protein  that  folds  through  an  intermediate: 
the  data  are  scattered  between  lines  with  slopes  of  0  and  1  indi¬ 
cating  non-uniform  formation  of  structure  in  the  transition 
state. 

Comparison  of  the  structures  of  CI-2  and  Arc  repressor  col¬ 
ored  by  value  (Fig.  3d,e  respectively)  with  that  of  the  src  SH3 
domain  (Fig.  3a)  accentuates  the  difference  between  the  transi¬ 
tions  states  of  these  proteins.  Both  CI-2  and  Arc  repressor  show 
a  relatively  uniform  distribution  of  low  (blue)  and  intermediate 
(magenta)  Op  values,  while  the  src  SH3  domain  is  split  into  a 


high  Op  value  region  (red)  and  a  low  Op  value  (blue)  one. 
Although  a  greater  number  of  mutants  was  generated  for  CI-2 
and  the  Arc  repressor,  the  absolute  number  of  residues  with  Op 
values  greater  than  0.5  is  still  larger  for  src  SH3  than  for  the  two 
other  proteins:  six  for  src  SH3  domain  versus  one  for  Arc 
repressor-^’-"  and  two  for  C12^\  The  unusual  features  of  the  src 
SH3  transition  state  may  reflect  the  dominant  effect  of  topolo¬ 
gy  in  specify’ing  the  transition  state  structure.  Unlike  the  other 
proteins  whose  transition  states  have  been  characterized,  the 
SH3  domain  consists  predominantly  of  |3-sheets.  The  rate-lim¬ 
iting  step  in  folding  may  involve  docking  of  transiently  formed 
structural  elements;  in  (3-sheet  proteins  such  local  structure  is 
likely  to  occur  in  the  vicinity  of  p-turns,  and  in  helical  proteins, 
near  the  middle  of  helices.  Thus,  the  folding  transition  states  of 
p-sheet  proteins  may  be  expected  to  be  more  polarized  and  not 
as  centered  on  the  hydrophobic  core  as  those  of  helix  contain¬ 
ing  proteins.  More  generally,  the  topology  of  P-sheet  proteins  is 
to  some  extent  determined  by  the  positions  and  changes  in 
chain  orientation  in  the  P-turns  (in  the  SH3  domain,  the 
diverging  turn  is  one  of  the  two  transitions  between  the  sheets, 
and  the  distal  loop  p-turn  sets  up  the  hydrophobic  contacts 
along  the  distal  loop  P-hairpin),  and  thus  formation  of  a  small 
number  of  critical  p-turns  could  be  coupled  to  the  formation  of 
sufficient  favorable  native  interactions  to  overcome  the  entrop- 
ic  barrier  to  folding.  The  polarization  may  also  reflect  the 
importance  of  hydrogen  bonds  in  stabilizing  the  transition 
state:  hydrogen  bonds  have  much  stronger  orientational  con¬ 
straints  than  hydrophobic  interactions  and  are  not  likely  to  be 
stabilizing  unless  almost  fully  formed. 

In  the  past  several  years  there  has  been  considerable  discus¬ 
sion  of  the  differences  between  the  ‘new*  and  the  classical 
views  of  protein  folding-^-^.  For  small,  single  domain  proteins 
with  kinetics  and  thermodynamics  well  described  by  a  two- 
state  model,  the  distinction  primarily  concerns  the  breadth  of 
the  transition  state  ensemble:  in  the  classical  view,  the  transi¬ 
tion  state  consists  of  a  relatively  well  defined  set  of  conforma¬ 
tions,  whereas  for  the  funnel  shaped  energy  landscapes 
suggested  by  the  new  view,  the  set  of  conformations  can  be 
extremely  diverse  (in  the  limit  of  the  models  of  ref.  6,  the  tran¬ 
sition  state  ensemble  includes  all  conformations  with  a  pardcu- 
lar  degree  of  order).  It  is  important  to  note  that  the  transition 
state  approximation  is  valid  independent  of  the  homogeneity 
(or  lack  thereof)  of  the  transition  state,  and  that  both  old  and 
new  views  are  consistent  with  the  simple  exponential  kinetics 
observed  for  the  folding  of  small  proteins.  In  the  sequence  sim¬ 
plification  experiments^  the  folding  rate  was  relatively 
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Table  2  Kinetic  and  thermodynamic  parameters  for  wt  SH3  domain  and 

mutants^  _ 


Name 

AG^o 

kf03 

(kcal  mol-') 

(s-') 

(s-') 

WT 

3.7 

39 

2.1 

— 

— 

— 

FIOI 

2.7 

39 

32 

0.05 

0.08 

0.00 

D15A 

3,3 

42 

5.5 

0.01 

0.02 

-0.09 

Y16A 

0.9 

24 

91 

0.10 

0.11 

0.08 

518A 

4.1 

46 

1.2 

*2 

* 

* 

L24A 

1.9 

17 

21 

0.21 

0.25 

0.20 

G29A 

2.1 

12 

10 

0.33 

0.39 

0.29 

E30A 

1.8 

4.6 

7.4 

0.79 

0.67 

0.67 

L32A 

0.1 

4.8 

9.6 

A3 

A 

A 

I34A 

3.0 

5.1 

0.4 

5.5 

1.61 

1.31 

N36A 

4.0 

45 

3.3 

* 

■k 

* 

G40A 

3.8 

20 

1.8 

* 

* 

* 

W42A 

2.5 

27 

12 

0.07 

0.08 

0.11 

L44A 

1.5 

7.8 

9.3 

0.37 

0.35 

0.34 

A45G 

2.2 

6.9 

1.8 

0.72 

0.58 

0.54 

S47A 

1.2 

2.9 

2.4 

0.87 

0.60 

0.75 

T50A 

1,9 

3.2 

3.5 

0.73 

0.72 

0.57 

G51A 

2.0 

4.7 

1.9 

0.77 

0.67 

0.64 

Y55A 

1.9 

8.8 

7.5 

0.39 

0.43 

0.35 

I56A 

1.4 

3.9 

4.7 

0.58 

0.49 

0.49 

Y60A 

4.1 

40 

1.2 

* 

* 

* 

V61A 

3.1 

40 

18 

-0.01 

-0.01 

-0.01 

’All  experiments  were  performed  at  295  K  using  50  mM  sodium  phosphate  as  buffer. 
M3  Is  the  folding  rate  in  0.3  M  guanidine,  kj\  the  unfolding  rate  in  3.5  M  guanidine. 
Details  on  the  three  different  methods  for  calculating  <t>F  values  are  described  in  the 
Methods  section. 

2These  mutants  (*)  were  more  stable  than  wt  SH3. 

3Due  to  the  very  low  stability  of  this  mutant  {'^)  thermodynamic  and  kinetic  parameters 
are  only  rough  estimates. 


(from  0.4  to  0.8)  than  the  very  large  number  of 
sequence  changes  between  src  SH3,  fyn  SH3,  spec¬ 
trin  SH3,  and  the  simplified  SH3  variant  FPl  (ref. 
1)  (mf/m  =  0.69,  0.68,  0.69  and  0.63,  respectively). 
By  analogy  with  these  results,  differences  in  topol¬ 
ogy  may  also  underlie  the  differences  in  folding 
scenarios  derived  from  studies  of  lattice  models  of 
proteins  (the  delocalized  nuclei  of  ref.  31  versus 
the  specific  nucleus  of  ref.  32,  for  example;  it  was 
anticipated  that  the  delocalized  nuclei  scenario 
may  be  more  common  for  small  helical  proteins^^). 
The  recent  finding  that  the  folding  rates  and  m,/m 
ratios  for  small  single  domain  proteins  are  corre¬ 
lated  strongly  with  the  average  separation  between 
contacting  residues  suggests  that  the  relationship 
between  topology  and  folding  mechanism  is  quite 
generaP^. 

Our  data  present  a  challenge  for  methods  that 
seek  to  predict  folding  transition  state  structure 
using  molecular  dynamics^**  and  other  computa¬ 
tional  approaches^*’^^  Agreement  between  theory 
and  experiment  has  been  claimed  in  a  number  of 
cases  involving  CI2,  but  for  this  protein  it  is  only 
necessary  to  predict  that  most  interactions  are  par¬ 
tially  formed  to  achieve  reasonable  success.  The 
highly  polarized  src  SH3  transition  state  will  pro¬ 
vide  a  much  more  rigorous  test  of  computational 
models  as  it  requires  the  precise  identification  of 
crucial  residues.  To  make  at  least  part  of  the  test 
blind,  we  are  currently  determining  Op  values  for 
the  remaining  residues  in  the  structure  and  invite 
predictions  of  these  as  a  test  of  computational 
models  of  protein  folding. 


unchanged  by  drastic  changes  in  the  sequence,  consistent  with 
a  simple  funnel  picture  in  which  interactions  stabilizing  the 
native  state  also  stabilize  partially  folded  conformations.  The 
clustering  of  mutations  which  primarily  affect  the  folding  rate 
(have  high  O?  values)  in  the  distal  loop  and  diverging  turn  sug¬ 
gest  that  the  transition  state  ensemble  for  the  src  SH3  domain 
is  relatively  well  defined  and  thus  that  the  folding  funnel 
departs  considerably  from  symmetry.  Taken  together,  our  data 
suggest  that  the  folding  free  energy  landscape  of  the  src  SH3 
domain  is  somewhere  between  that  envisioned  in  the  classical 
view  of  folding  and  the  extreme  of  a  completely  symmetrical 
funnel. 

Protein  folding  landscapes  could  in  principle  deviate  from 
symmetry  either  because  of  heterogeneities  in  inter-residue 
contact  energies  or  because  of  asymmetries  in  the  folded  struc¬ 
ture.  The  robustness  of  the  SH3  domain  transition  state  to 
large  changes  in  sequence  (and  hence  changes  in  the  residue- 
residue  interaction  energies)  indicated  by  the  strong  similari¬ 
ties  between  the  src  and  spectrin  SH3  transition  states  and  the 
near  wild  type  folding  rates  of  the  simplified  SH3  variants,  and 
the  contrast  with  the  more  delocalized  transition  states  of  Arc 
repressor  and  CI-2  suggest  that  topological  features  of  the  SH3 
domain  fold  rather  than  heterogeneities  in  the  contact  ener¬ 
gies  are  likely  to  be  responsible  for  the  departure  from  symme¬ 
try.  The  importance  of  topology  in  determining  folding 
mechanism  is  further  highlighted  by  a  comparison  to  X  repres- 
sor^°;  two  relatively  subtle  Gly  to  Ala  substitutions  in  one  of 
the  helices  of  this  protein  caused  a  much  larger  change  in  m^/m 


Methods 

Mutagenesis  and  purification.  The  SH3  gene  was  cloned  into 
the  Ndel  and  BamHI  sites  of  the  pET  15b  expression  vector 
(Novagen).  Mutagenesis  was  acconnplished  using  the  Quick 
Change  Site-Directed  mutagenesis  kit  (Stratagene).  Plasmids  har¬ 
boring  the  point  mutations  were  transformed  into  BL21  cells,  and 
protein  was  overexpressed  and  purified^.  The  His»Tag®  was  not 
removed  for  the  purposes  of  this  study.  All  mutants  were 
sequenced  to  ensure  that  the  mutagenesis  was  successful  and  the 
purified  proteins  were  analyzed  by  mass  spectrometry  to  confirm 
that  each  mutation  was  the  expected  one. 

Biophysical  analysis.  In  all  experiments,  proteins  solutions  were 
made  in  50  mM  sodium  phosphate  (JT  Baker),  pH  6,  and  the  tem¬ 
perature  was  held  constant  at  295  K.  The  stability  of  the  point 
mutants  was  assessed  by  guanidine  denaturation  using  either  CD 
or  fluorescence  as  described^®.  The  kinetics  of  folding  and  unfold¬ 
ing  were  followed  by  fluorescence  on  a  Bio-Logic  5FM-4  stopped- 
flow  instrument^®.  The  unfolding  reaction  for  the  wild  type 
protein  was  well  modeled  as  a  two-state  process  and  the  kinet¬ 
ic  and  equilibrium  data  for. the  mutants  were  fit  to  a  two-state 
model. 

0  value  analysis.  There  are  several  different  ways  of  measuring 
the  values  of  AAGu-f  and  AAGy-t  that  determine  Of.  Because  of  the 
possible  errors  introduced  by" extrapolation  we  report  three  esti¬ 
mates  of  the  Op  value  for  all  the  mutants  destabilized  by  more 
than  0,5  kcal  moh’:  (i)  for  both  AAGu-f  and  AAGu-t  were 

computed  from  kinetic  data  extrapolated  to  H2O'®;  (ii)  for 
AAGu-f  was  computed  from  equilibrium  data  and  AAGu-:  from 
kinetic  data  extrapolated  to  H2O;  (iii)  AAGu-f  was  comput¬ 

ed  using  (ACm  mavg)  and  AAGy-t  from  the  folding  rate  in  0.3  M 
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guanidine”.  The  values  obtained  by  the  three  methods  match 
very  closely  confirming  the  validity  of  our  results.  Small  differ¬ 
ences  are  seen  only  in  the  significantly  destabilized  mutants  for 
which  estimates  of  the  equilibrium  AGy  are  not  very  accurate  due 
to  the  lack  of  a  folded  baseline. 
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Previous  studies  of  the  confornnations  of  peptides  spanning  the  length  of 
the  7-spectrin  SH3  domain  suggested  that  SH3  domains  lack  indepen¬ 
dently  folding  substructures.  Using  a  local  structure  prediction  method 
based  on  the  I-sites  library  of  sequence-structure  motifs,  we  identified  a 
seven  residue  peptide  in  the  src  SH3  domain  predicted  to  adopt  a  native¬ 
like  structure,  a  type  II  P-tum  bridging  unpaired  P-strands,  that  was  not 
contained  intact  in  any  of  the  SH3  domain  peptides  studied  earlier.  NMR 
characterization  confirmed  that  the  isolated  peptide,  FKKGERL,  adopts  a 
structure  similar  to  that  adopted  in  the  native  protein:  the  MOE  and 
coupling  constant  patterns  were  indicative  of  a  type  11  P-turn,  and  NOEs 
between  the  Phe  and  the  Leu  side-chains  suggest  that  they  are  juxtaposed 
as  in  the  prediction  and  the  native  structure.  These  results  support  the 
idea  that  high-confidence  I-sites  predictions  identify  protein  segments 
that  are  likely  to  form  native-like  structures  early  in  folding. 

(■  1998  Academic  Press 

Keywords:  SH3  domain;  folding  initiation  site;  protein  folding;  local 
structure;  P-turn 


Ini  oduction 

There  has  been  considerable  discussion  about 
the  role  of  local  interactions  in  protein  folding 
(Abkevich  et  ill.,  1995;  Avbelj  &  Moult,  1995;  Doyle 
ft  ai,  1997;  Fersht,  1995;  Munoz  &  Serrano,  1996; 
Unger  &  Moult,  1996).  Direct  experimental  studies 
hampered  because  early  events  of  protein  fold¬ 
ing  are  likely  to  take  place  within  the  microsecond 
timc-'^cale.  To  investigate  the  role  of  P-turns, 
pcq'  ie  fragments  derived  from  proteins  have 
been  studied  as  models  for  early  events  in  protein 
lolding  (Dyson  et  at.,  1988,  1992).  Those  studies 
hove  shown  that  sequences  that  form  turns  in  pro- 
k‘ins  can  also  form  reasonably  stable  turns  in  short 
peptides  in  aqueous  solution.  However,  because 
non-local  interactions  play  an  important  role  in 


Ab='  i'viations  used:  ID,  one-dimensional;  ppb,  parts 
per  ppm,  parts  per  million;  NOE,  nuclear 

yeuihauser  effect;  TOCSY,  total  correlated  spectroscopy; 
kOESY,  rotating  frame  Overhauser  effect  spectroscopy; 

3-{Trimethvlesilvl)propionic-2,2,3,3-d4  acid; 
^k-\LDI-iMS,  matrix  assisted  laser  desorption  ionization 
spectrometry. 

E'rnail  address  of  the  corresponding  author: 

‘^ker©ben, bchem.washington.edu 


stablizing  structure  in  both  the  native  state  and 
denatured  state  (Wang  k.  Shortle,  1997),  many  pep¬ 
tides.  derived  from  proteins  do  not  adopt  well- 
defined  structure  in  isolation.  In  particular,  a  recent 
study  of  five  peptides  derived  from  the  7-spectrin 
SH3  domain  failed  to  detect  any  persistent  struc¬ 
ture  and  it  was  concluded  that  folding  initiation 
sites  do  not  play  a  role  in  the  folding  of  this  all-P 
protein  (Viguera  et  a/.,  1996). 

We  have  recently  developed  a  method  for  local 
protein  structure  prediction  based  on  a  library 
(I-sites)  of  7  to  19  residue  sequence  patterns  that 
strongly  correlate  with  local  protein  structural  fea¬ 
tures  (Bystroff  &  Baker,  1997,  1998).  The  sequence 
segments  matching  a  particular  pattern  almost 
always  adopt  the  same  conformation  in  a  wide 
range  of  protein  structures,  suggesting  that  local 
interactions  within  the  segment  are  strong  enough 
to  override  the  differences  in  non-local  interactions. 
Inspection  of  the  sequence-structure  motifs  in  most 
cases  readily  reveals  the  interactions  that  stabilize 
the  observed  structure.  One  of  the  novel  motifs  is  a 
''diverging  type  II  P-turn"  stabilized  by  a  side- 
chain-to-backbone  hydrogen  bond  and  a  pair  of 
inwardly  turned  hydrophobic  residues  bracketing 
the  turn  (Bystroff  &  Baker,  1998).  The  turn  is 
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referred  to  as  diverging  because  the  two  strands 
connected  by  it  do  not  form  backbone  hydrogen 
bonds. 

We  have  proposed  that,  since  the  interactions 
within  the  sequence  segments  override  non-local 
interactions,  peptide  segments  that  closely  match 
one  of  the  sequence  patterns  are  likely  to  adopt 
structure  in  isolation  and  potentially  serve  as  fold¬ 
ing  initiation  sites.  Because  the  SH3  domain  had 
been  proposed  to  lack  such  peptide  segments,  we 
were  curious  about  I-sites  predictions  of  local 
structure  for  this  protein  family.  Interestingly,  the 
peptide  segment  predicted  to  be  the  most  likely  to 
have  structure  in  isolation  was  a  diverging  type  II 
P-turn  that  was  not  contained  intact  in  any  of 
the  a-spectrin  SH3  peptides  studied  previously 
(Viguera  ct  ai,  1996).  In  this  study,  we  demonstrate 
that  the  sequence  FKKGERL,  derived  from  the 
diverging  type  II  turn  in  the  src  SH3  domain, 
adopts  that  conformation  in  isolation. 


Results 

Local  structure  prediction 

Figure  1(a)  shows  the  location  of  all  correct  (red) 
and  incorrect  (blue)  fragment  structure  predictions 
(see  Materials  and  Methods)  for  the  src  SH3 
domain.  The  two  highest-confidence  predictions 
occurred  around  the  position  of  the  type  II  P-tum 
at  residues  27  to  30  (shaded  region  m  Figure  1(b)); 
both  correctly  predict  a  type  II  turn,  an  inwardly 
turned  glutamate  side-chain,  and  the  juxtaposition 
of  the  Phe26  and  Leu32  side-chains.  We  selected 
the  peptide  FKKGERL  (residues  26  to  32)  for  struc¬ 
tural  studies,  omitting  three  residues  of  the 
P-strand.  The  high-confidence  prediction  is  a  con¬ 
sequence  of  the  similarity  between  the  sequence 
pattern  for  the  I-sites  diverging  turn  motif 
(Figure  2(a))  and  the  SH3  multiple  sequence  align¬ 
ment  at  residues  26  to  32  (Figure  2(c)).  As 
expected,  given  the  strong  similarity  of  the 
sequence  patterns,  the  predicted  structure 
(Figure  2(b))  is  very  similar  to  that  in  the  crystal 
structure  of  the  src  SH3  domain  (Figure  2(d)). 

NMR  study  of  the  peptide  conformation 

To  avoid  artifacts  due  to  non-native  electrostatic 
interactions  between  the  N  and  C  termini,  we  stu¬ 
died  the  structure  of  two  forms  of  the  peptide;  one 
with  free  N  and  C  termini,  the  other  with  an 
acetylated  N  terminus  and  an  amidated  C  termi¬ 
nus.  The  results  were  virtually  identical  for  the  two 
peptides;  for  brevity,  we  present  only  the  data  for 
the  blocked  peptide. 

Backbone  conformation 

The  proton  NMR  spectrum  of  the  peptide  was 
completely  assigned  using  a  50  ms  TOCSY  exper¬ 
iment  and  confirmed  by  a  250  ms  ROESY  exper¬ 
iment.  The  chemical  shifts  for  all  proton  resonances 
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Figure  1.  (a)  I-sites  fragment  predictions  for  the  5H3 
domain  sequence  family.  Red  bars  indicate  correct  . 
dictions  according  to  the  crystal  structure  (IFMK);  blu^ 
bars  are  incorrect  predictions.  In  the  text  box 
sequence  and  secondary  structure  of  the  src  bru 
domain;  (S,  strand;  T,  turn;  L,  loop;  3,  3io  helix), 
peptides  of  :t-spectrin  SH3  domain  studied  by  Vigue^  | 
ct  al.  (1996)  are  indicated  by  the  rectangular  bars  | 
the  graph,  (b)  Backbone  trace  of  the  SH3  | 

(IFMK)  showing  the  location  of  the  di\'erging 
(shaded).  i  P 
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F5<^ure  2.  (a)  Sequence  profile  and  (b)  paradigm  struc- 
tur  LEFT  residues  247  to  253)  for  the  diverging  turn 
motit  from  the  I-sites  library  (Bystroff  &  Baker,  1998). 
(c)  Sequence  profile  for  the  diverging  turn  s,egment  in 
the  SH3  domain  sequence  family,  and  (d)  'the  native 
structure  of  this  portion  of  the  src  SH3  domain  (IFMK 
residues  102  to  108;  Xu  et  aL,  1997),  Numbering  in  (c) 
and  (d)  is  consistent  with  our  previous  studies  (Riddle 
ct  aL,  1997),  The  colors  in  the  profile  tables  represent  the 
log  likelihood  ratio,  log  [Pij/Pi],  where  is  the  fre¬ 
quency  of  amino  acid  i  at  position  j  in  the  I-sites 
seq:.  i.ce  pattern  (a)  or  SH3  domain  multiple  sequence 
alignment  (c),  and  Pj  is  the  average  frequency  of  amino 
acid  i  in  the  proteins.  The  color  scale  is  shown  in  the 
central  panel;  favored  amino  acids  are  in  red  and  disfa¬ 
vored  in  blue.  Conserved  polar  residues  in  the  sequence 
profiles  are  shown  in  green;  conserved  non-polar  resi¬ 
dues  in  purple.  Solvent-accessible  surfaces  are  drawn 
around  the  non-polar  side-chains.  The  Figure  was  made 
using  Raster3D  (Merritt  &  Murphy,  1994)  and  Mathema- 
lica  (Wolfram  Research,  Inc). 


summarized  in  Table  1.  The  NOE  pattern 
observed  in  ROESY  spectra,  the  three-bond  NH^ 


coupling  constants,  and  the  temperature  coeffi¬ 
cients  of  the  amide  protons  are  summarized  in 
Figure  3.  As  described  in  the  following  paragraphs, 
all  the  NMR  parameters  indicate  that  a  type  II 
P-turn  conformation  is  significantly  populated. 

P-Turns  yield  characteristic  patterns  of  sequential 
and  medium-range  NOEs  and  ^nh-.  coupling  con¬ 
stants  (Wuthrich,  1986).  NOE  patterns  indicative  of 
P-turns  include  a  NOE  between  residues  3  and 
4  and  a  i  H-  2)  NOE  between  residues  2  and  4 
in  the  turn.  Type  I  and  II  turns  are  distinguished 
by  the  relative  strengths  of  the  tl^|sj(2,3)  NOE 
(stronger  for  type  II)  and  the  d^^{2,3)  NOE 
(stronger  for  type  I).  A  strong  i  -F 1) 

NOE  between  Gly29  and  Glu30,  and  a  less  strong 
i  +  2)  NOE  between  Lys28  and  Glu30  in 
the  peptide  were  observed  in  ROESY  spectra 
(Figure  4(a)).  The  presence  of  this  characteristic 
NOE  pattern  suggests  a  high  preference  for  a 
P-tum  conformation  for  the  Lys-Lys-Gly-Glu 
segment  in  the  peptide.  The  pattern  of  a  strong  d.j,^ 
and  a  weak  d^^  NOE  between  Lys28  and  Gly29, 
along  with  a  very  weak  NOE  between  Lys28 
and  Glu30  (observed  at  lower  thresholds,  but  not 
shown  in  Figure  4(a)),  is  most  consistent  with  a 
type  II  P-tum  (Campbell  et  aL,  1995;  Wuthrich, 
1986).  P-Tums  are  characterized  also  by  a  small 
value  (^5  Hz)  of  for  residue  2  in  the  turn 
(Wuthrich,  1986).  The  observed  Vnh,  coupling  con¬ 
stant  for  Lys28  (5.0  Fiz,  Figure  3),  compared  to  the 
value  of  6.6  Hz  for  Lys  in  the  random  coil 
conformation  (Smith  et  aL,  1996),  is  consistent  with 
its  position  as  residue  2  in  a  turn  conformation. 

In  most  p-tums  there  is  a  backbone-backbone 
hydrogen-bond  between  the  carbonyl  oxygen  atom 
of  residue  1  and  the  NH  group  of  residue  4.  This 
hydrogen  bond  often  leads  to  a  small  temperature 
coefficient  (0<-’A5/AT<5  ppb  K“^)  for  the 
amide  proton  of  residue  4  (Rose  et  aL,  1985)  .  The 
amide  proton  of  Glu30  has  a  temperature  coeffi¬ 
cient  of  —4.6  ppb  K”^  (Figure  3)  (compared  to 
6  <  -A5/ AT  <  10  ppb  K"^  for  amide  groups  in  a 
random  coil  conformation),  suggesting  a  role  as  a 
hydrogen-bond  donor,  as  expected  for  residue  4  in 
a  turn  conformation.  This,  combined  with  the  evi¬ 
dence  for  a  type  II  turn  conformation,  would  impli¬ 
cate  the  carbonyl  oxygen  atom  of  Lys27  as  the 
likely  H-bond  acceptor. 


Table  1.  Proton  assignments  in  50  mM  sodium  phosphate  at  pH  6.0  and  12®C 


Residue 

NH 

C^H 

CH 

Others 

Phe26 

8.34 

4.54 

3.09,  3.01 

57.26,  s7.36,  ^7.32 

Lys27 

8.45 

4.30 

1.78,  1.68 

y1.38,  51.65,  s2.99 

Lys28 

8.48 

4.20 

1.80,  1.73 

Yl.48;  1-45,  51.71,  e3.03 

Gly29 

8.65 

3.93,  3.99 

Glu30 

8.04 

4.31 

1.95,  2.04 

y2.09;  2.25 

Arg31 

8.55 

4.31 

1.84,  1.78 

y1.61,  53.19,  7.25 

Leu32 

8,41 

4.31 

1.67,  1.60 

yI.62,  50.93;  0.87 

TSP  was  used  as  a  reference  for  chemical  shifts. 

Protein 

Peptide 

daN{i.  i+1) 
dNN(i.  i+1) 
d(.^N(i.  i+2) 
dpN{i.  i+2) 
dNp(i.  i+3) 


26  27  28  29  30  31  32 

1  2  3  4  5  6  7 

F  K  K  G  E  R  L 


dNyCi.  i+3)  - - - 

3jNHa(Hz)  6.0  7.5  5.0  5.5  6.5  6.5  6.5 

-A6/5T  (ppb/K)  8.7  9.4  7.9  9.7  4.6  8.3  9.5 

Figure  3.  Observed  NOE  patterns,  values  of  the  7nh, 
coupling  constants,  and  the  tenaperature  coefficients  for 
the  peptide  FKKGERL.  The  widths  of  the  lines  represent 
the  NOE  intensities.  The  broken  lines  represent  some  of 
the  ->1,  0  NOEs  that  overlap  with  i)  NOEs. 
The  results  indictive  of  the  type  II  ^-turn  conformation 
are  indicated  with  asterisks  (*). 


Side-chain  interactions  that  stabilize  the  ^-turn 

In  addition  to  the  backbone  hydrogen  bond 
(Lys27  CO-Glu30  NH)  interaction  described  above, 
there  are  two  other  interactions  observed  in  the 
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Figure  5.  Portion  of  the  ROESY  spectrum  displaying; 
the  NOEs  between  the  aromatic  protons  of  Phe26  and 
the  methyl  groups  of  Leu32  at  12  C  and  pH  6.0. 


folded  protein  that  may  contribute  to  the  stability 
of  the  isolated  P-tum  (Figure ^2(d)).  The  first  is  a 
hydrophobic  interaction  between  the  aromatic 
of  Phe26  and  the  methyl  groups  of  Leu32.  NOli 
cross-peaks  between  the  aromatic  protons  of  Phe26 
and  the  methyl  groups  of  Leu32  are  clearly  visible 
in  the  ROESY  spectrum  at  12  C  (Figure  5),  indicat¬ 
ing  that  conformations  are  populated  in  which  the 
aromatic  ring  and  the  methyl  groups  are  in  rela¬ 
tively  close  contact  (<5  A).  The  side-chain  NOH 
between  Phe26  and  Leu32  is  observed  at  12  C  but 
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Figure  4.  ROESY  spectrum  at  pH  6.0  and  12  C.  (a)  the  C(:(,  p)H  -  NH  region,  (b)  the  NH  -  NH  region.  A  2a0 
mixing  time  was  used.  Onlv  tlie  inter-residue  NOEs  are  labelled  in  the  spectra 
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pH 


Figure  6.  Dependence  of  the  amide  proton  chemical 
shifts  on  pH.  The  amide  protons  of  K27,  K28  and  E30 
(o '  ’nuous  curves)are  sensitive  to  pH,  while  the  amide 
irjv  .'s  of  F26,  G29  and  L32  (broken  curves)  are  not. 


not  at  24"C  (data  not  shown).  These  results  suggest 
that  the  two  hydrophobic  side-chains  are  juxta¬ 
posed  as  in  the  prediction  and  in  the  native  struc¬ 
ture. 

'  second  native  interaction  is  a  hydrogen 
bona  between  a  side-chain  carboxylate  oxygen 
atom  of  Glu30  and  the  backbone  amide  proton  of 
Lys27.  We  reasoned  that  such  an  interaction 
would  be  dependent  on  the  ionization  state  of 
the  Giu30  side-chain,  the  only  ionizable  group  in 
the  peptide  with  a  pK^  between  2.5  and  6.5,  and 
carried  out  a  pH  titration  to  investigate  this 
possibility.  The  amide  proton  of  Lys27  undergoes 
the  largest  chemical  shift  change  of  any  amide 
ovuf  a  pH  range  of  2.5  to  6.5.  When  the  Glu30 
side-chain  is  deprotonated  at  pH  6.0,  the  amide 
proton  of  Lys27  in  the  peptide  is  shifted  down- 
field  by  ~0-22  ppm  relative  to  pH  2.6  (Figure  6). 
This  shift  is  consistent  with  an  interaction  with 
the  Glu30  side-chain,  as  hydrogen-bonding  will 
cause  deshielding  of  the  amide  proton,  which 
generally  results  in  a  downfield  shift  for  the 
amide  proton.  The  temperature  coefficient  of 
h}-  ■  amide  proton  is  greater  than  the  expected 
^’aiue  for  an  amide  proton  involved  in  hydrogen- 
honding  in  proteins  (—9.4  ppb  K"^  versus 

ppb  K”^)  but,  since  an  increase  in  tempera- 
ttire  is  likely  to  increase  the  mobility  of  the 
GIu30  side-chain  significantly,  the  temperature 
coefficient  is  probably  not  an  accurate  indicator 
of  side-chain  to  main-chain  hydrogen  bond  for¬ 
mation  in  the  peptide.  Weak  NOEs  between  the 
1^  a-'  i  Y  protons  of  the  Glu30  side-chain  and  the 
proton  of  Lys27  were  observed  at  low 
hvoshholds  in  ROESY  spectra  (Figure  3).  These 
msults  further  suggest  that  the  side-chain  of  the 
G1u30  is  pointed  toward  the  Lys27  amide  proton, 
c'onsistent  with  a  hydrogen  bond  between  the 
Lys27  amide  proton  and  the  side-chain  carboxyl 
^^ygen  atom  of  Glu30. 


Discussion 

The  structure  of  fragments  of  the  src-SH3 
domain  was  predicted  using  the  l-sites  library.  The 
peptide  fragment  with  the  highest  confidence  pre¬ 
diction,  FKKGERL,  was  synthesized  and  its  struc¬ 
ture  characterized  by  proton  NMR.  The  NMR 
parameters  support  the  prediction  that  the  peptide 
adopts  a  native-like  diverging  type  II  P-tum 
(Figure  2(b)).  This  is  the  first  short  peptide  from  an 
SH3  domain  that  has  been  demonstrated  to  adopt 
a  native-like  conformation  in  isolation. 

l-sites  predictions  and  the  conformational 
preferences  of  isolated  peptides 

The  diverging  turn  appears  to  be  the  lowest  free 
energy  conformation  of  the  short  peptide  studied 
here.  There  are  three  factors  that  favor  the  diver¬ 
ging  turn  conformation  for  the  peptide  FKKGERL 
(Figure  2(b)):  the  hydrophobic  interaction  between 
the  Phe  at  position  1  and  the  Leu  at  position  7,  the 
Gly  at  position  4  allowing  a  positive  phi  angle,  and 
the  Glu  at  position  5  that  forms  a  hydrogen  bond 
with  the  backbone  amide  group  at  position  2.  The 
pattern  of  sequence  conservation  in  the  I-sites  pro¬ 
file  shows  each, of  these  features,  but  it  shows  also 
features  that  can  be  explained  only  by  negative 
design.  Polar  side-chains  are  conserved  in  positions 
3  to  5  (Figure  2(a));  this  prevents  the  formation  of  a 
stable  amphipathic  oc-helix,  since  the  latter  requires 
conserved  non-polar  side-chains  separated  by  three 
or  four  residues.  Similarly,  polar  side-chains  in 
positions  3  and  5  destabilize  the  amphipathic 
(3-strand  conformation,  which  prefers  non-polar 
residues  in  those  positions.  Position  3  in  the  profile 
is  always  polar,  but  is  never  observed  to  be  Asp  or 
Asn.  This  may  be  negative  design  against  the  type 
I  p-tum,  which  prefers  Asp  or  Asn  followed  by 
Gly. 

The  lack  of  stable  structure  in  the  SH3  peptides 
studied  by  Serrano  and  colleagues  (Viguera  et  al, 
1996)  is  consistent  with  the  l-sites  predictions.  As 
illustrated  in  Figure  1(a),  there  is  no  high-confi¬ 
dence  prediction  spanning  the  five  peptides  that 
they  studied.  Two  of  the  peptides  contain  portions 
of  the  diverging  turn;  however,  neither  contains 
the  whole  segment  (the  sequence  MKKGDIL  in  the 
a-spectrin  SH3  domain). 

The  confidence  of  a  fragment  prediction  may  be 
related  to  the  fragment's  free  energy  of  folding  in 
isolation.  The  confidence  of  l-sites  predictions  is 
defined  as  the  fraction  of  peptide  segments  with  a 
certain  similarity  score  that  have  the  predicted 
structure  (Bystroff  &  Baker,  1998).  For  example,  all 
seven  residue  peptide  segments  in  the  protein 
database  were  scored  against  the  sequence  profile 
for  the  diverging  turn  motif  (Figure  2(a)),  and  94% 
of  all  segments  with  a  similarity  score  between  144 
and  151  were  found  to  have  the  diverging  turn 
structure;  therefore,  a  new  sequence  segment  with 
a  score  of  147  has  an  estimated  94%  probability  of 
being  a  diverging  turn.  Because  the  segments  in 
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the  database  from  which  the  I-sites  library  was 
derived  all  differ  in  their  respective  global  struc¬ 
tures,  the  strong  structural  propensities  must  be 
due  to  internal  contacts  conserved  within  1-sites 
clusters.  Clearly,  the  relation  of  confidence  to  equi¬ 
librium  concentration,  and  through  this  to  free 
energy,  is  a  crude  one.  Nonetheless,  the  evidence 
suggests  that  the  relation  holds  qualitatively,  at 
lea^t  for  sequence  segments  with  high  confidence 
scores. 

The  studies  by  Dyson  et  ai  (1992)  on  short  pep¬ 
tides  comprising  the  entire  length  of  the  P-sand- 
wich  protein  plastocyanin  showed  that  only  a 
small  number  of  peptides  had  any  tendency  to 
form  structure  in  isolation.  Interestingly,  the  pep¬ 
tide  with  the  strongest  structural  tendencies  was  a 
diverging  p-tum  similar  to  that  studied  here.  To 
further  explore  the  relationship  between  I-sites  pre¬ 
dictions  and  peptide  conformational  preferences, 
I-sites  predictions  were  made  for  the  plastocyanin 
sequence  family.  The  three  regions  found  pre- 
viouslv  to  have  some  native  structural  tendencies 
in  isolation  were  all  contained  within  correct  I-sites 
structure  predictions  with  a  confidence  of  0.60  or 

greater.  . 

We  present  several  more  I-sites  predictions 
(Table  2)  to  be  confirmed  or  refuted  by  future 
experimental  evidence.  These  short  sequences  were 
chosen  mostly  from  small  proteins,  some  of  which 
are  presently  under  intensive  experiinental  study. 
They  are  predicted  to  adopt  a  significant  amount 
of  native  structure  in  isolation.  Structure-blind 
I-sites  predictions  can  be  made  automatically  via 

http://ganesh.bchem.washington.edu/  ^  bystroff/ 

Isites/. 

Role  of  diverging  turn  in  protein  folding 

It  is  interesting  to  compare  our  results  with 
recent  structural  information  obtained  on  unfolded 
states  of  the  drkN  SH3  domain  under  both  folding 
conditions  (U^xch)  denaturing  conditions 
Zhang  &  Forman-Kay,  1997).  The  most  marked 
differences  between  the  U^xch  states  are 


located  in  the  segment  loFRKTQILKIL^s,,  which 
corresponds  to  ^/-.FKKGERLQIV-^c;  in  the  src  SH3 
domain  (the  diverging  turn  segment  is  underlined). 

A  turn  conformation  was  populated  around 
,c,FRKT22  in  but  not  in  ^  residues  in 
03QILKIL2H  exhibited  severe  line-broadening  in  the 
U^.^ch  state  that  disappeared  upon  addition  of  dena- 
turant,  suggesting  that  this  portion  of  the  chain 
adopts  a  structure  that  is  in  intermediate  exchange 
with  other  conformational  substates.  By  analogy  to 
our  results,  the  conformational  substates  populated 
by  this  portion  of  the  drk  rnay  include  a  simi¬ 
lar  type  of  diverging  turn  conformation,  which 
may  be  less  stable  in  drk  than  in  src  because  of  the 
replacement  of  the  glycine  residue  by  threonine 
(Figure  2(a)). 

The  characteristic  features  of  the  diverging  turn 
sequence  motif  are  highly  conserved  in  all  known 
SH3  domains  (Koyama  et  al,  1993).  Each  of  the 
other  hairpin  turns  in  the  molecule  shows  evol¬ 
utionary  variability  in  both  sequence  and  length. 
This  portion  of  the  structure  is  less  variable  among 
the  three-dimensional  structures  of  SH3  domains 
solved  to  date  than  any  other  segments  outside  of 
the  hydrophobic  core  (Guruprasad  et  al.^  1995).  The 
strong  sequence  and  structural  conservation  within 
the  diverging  turn  along  with  the  drk  denatured 
state  and  the  src  peptide  studies  may  indicate  an 
important  role  for  the  turn  in  the  folding  process.  ^ 

It  is  interesting  that  the  turn  in  the  SH3  domain 
with  the  strongest  propensity  to  adopt  structure  in 
isolation  is  not  a  tight  p-hairpin;  instead,  the  turn 
connects  strands  that  do  not  share  backbone 
hydrogen  bonds  (the  earlier  peptide  studies  on 
a-spectrin  SH3  focused  on  the  p-hairpins).  The  SH3 
domain  may  be  viewed  as  two  orthogonally 
packed  p-sheets,  where  the  diverging  P-tum  is  one 
of  the  two  transitions  between  the  sheets  in  the 
protein.  Formation  of  the  diverging  p-tum  confor- 
mation  early  in  folding  could  play  an  importan 
role  in  establishing  the  topology  of  the  protein  by 
preventing  inappropriate  formation  of  a  p-hairpin 
and  promoting  proper  packing  of  hydroph^c 
side-chains  between  the  diverging  strands,  me 
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(Observation  of  a  structured  diverging  turn  peptide 
in  plastocyanin  suggests  that  such  a  role  may  be  a 
common  feature  of  the  folding  of  P-sheet  proteins. 
Our  results  show  that  the  diverging  turn  in  src- 
5H3  is  stable  in  isolation  and,  because  all  of  the 
interactions  are  local,  it  undoubtedly  forms  very 
rapidly.  This  is  consistent  with  kinetic  studies  of 
sn-  SH3  mutants  (Grantcharova  et  al,  1998),  which 
si;  -est  that  the  diverging  turn  and  the  following 
strand  come  together  with  the  distal  loop  hairpin 
in  the  folding  transition  state. 

An  attractive  feature  of  the  protein  folding  pro¬ 
blem  is  that  it  is  amenable  to  both  computational 
and  experimental  approaches.  The  work  described 
here  illustrates  that  a  combination  of  these 
approaches  can  provide  significantly  more  insight 
than  either  one  alone.  Ab  initio  structure  prediction 
me  ’ods  even  in  their  current  imperfect  state  can 
gei...ate  hypotheses  to  guide  experimental  studies 
of  the  folding  process. 

Materials  and  Methods 

Prediction  of  peptide  conformation 

A  sequence  profile  (Gribskov  et  ai,  1990)  was  con- 
st^lK'^ed  from  an  alignment  of  SH3  domain  sequences  in 
the  ISPROT  database,  aligned  initially  via  the  PHD 
server  (Rost  et  al.,  1994)  then  modified  to  agree  with  ear¬ 
lier  structural  alignments  (Feng  et  ai,  1995;  Guruprasad 
ct  ni,  1995).  All  subfragments  of  the  profile  were  scored 
iigainst  all  motifs  in  the  library  as  described  elsewhere 
(Bystroff  &  Baker,  1998).  Scores  were  translated  into  con- 
tidence  values  using  the  results  of  cross-validation  stu¬ 
dies  on  a  large  non-red undant  database  of  proteins  of 
known  structure;  the  confidence  of  a  prediction  is  simply 
the  probability  that  the  prediction  is  correct. 

NMR  examination  of  the  peptide  conformation 

The  sequence  Phe-Lys-Lys-Gly-Glu-Arg-Leu  was  syn¬ 
thesized  and  purified  by  Research  Genetics  Co.  (Hunts¬ 
ville,  AL),  and  the  molecular  mass  was  confirmed  by 
MALDI-MS.  Two  forms  of  the  peptide,  one  with  free 
‘irnino  and  carboxyl  groups  at  the  N  and  C  termini,  the 
father  with  an  acetylated  N  terminus  and  an  amidated  C 
terminus,  were  investigated  by  NMR.  NMR  samples 
'vert‘  epared  by  dissolving  ~10  mg  of  peptide  in 
of  50  mM  sodium  phosphate  buffer  (90%  H2O/ 
h)'*.,  ~H^O).  The  pH  of  the  samples  were  adjusted  using 
diluted  NaOH  or  HCl  solutions.  TSP  was  added  to  the 
samples  to  a  final  concentration  of  0.5  mM  for 
chemical  shift  referencing. 

spectra  were  acquired  on  a  Bruker  DMX500 
'‘Pvxtrometer  at  12°C  unless  otherwise  specified.  ^H- 
^OeSY  spectra  (Bax  &  Davis,  1985a)  were  collected 
”‘^ing  spectral  widths  of  6250  Hz  in  both  dimensions  and 
1*134  .  0()  complex  points.  ^H-ROESY  spectra  (Bax  & 
1^‘ivi^  v85b)  were  collected  with  a  spin-lock  field 

of  8.62  kHz,  a  spectral  width  of  6250  Hz  in  both 
dimensions  and  1024  x  600  complex  points.  The  mixing 
tor  the  TOCSY  and  ROESY  experiments  were  50  ms 
250  ms,  respectively.  Data  were  apodized  with  a 
Neared  sine  bell  in  both  dimensions,  and  zero-filled  to 
“I  K  X  1  K  complex  spectra.  Water  supression  was 
^<^hieved  with  the  Watergate  pulse  sequence  (Sklenar 


ct  ai,  1993).  The  recycle  delay  was  2.2  s  for  all  the  exper¬ 
iments.  All  the  data  processing  was  performed  on  an 
SGI  workstation  using  the  program  NMRpipe  (Delaglio 
cf  ^7/.,  1995). 

The  coupling  constants  were  obtained  directly 

from  the  resolved  amide  proton  resonances  in  the  ID 
spectrum  collected  with  a  digital  resolution  of  0.43  Hz/ 
point.  The  temperature  coefficients  of  the  amide  protons 
were  obtained  from  linear  fits  of  the  chemical  shift  data 
from  ID  spectra  acquired  at  9,  12,  15,  IS,  21,  24,  27,  30, 
33  and  36'^ C. 
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We  use  a  combination  of  experiments,  computer  simula¬ 
tions  and  simple  model  calculations  to  characterize,  first, 
the  folding  transition  state  ensemble  of  the  sre  SH3 
domain,  and  second,  the  features  of  the  protein  that  deter¬ 
mine  its  folding  mechanism.  Kinetic  analysis  of  mutations 
at  52  of  the  57  residues  in  the  sre  SH3  domain  revealed  that 
the  transition  state  ensemble  is  even  more  polarized  than 
suspected  earlier:  no  single  alanine  substitution  in  the 
N-terminal  15  residues  or  the  C-terminal  9  residues  has 
more  than  a  two-fold  effect  on  the  folding  rate,  while  such 
substitutions  at  15  sites  in  the  central  three-stranded  p- 
sheet  cause  significant  decreases  in  the  folding  rate. 
Molecular  dynamics  (MD)  unfolding  simulations  and  ab 
initio  folding  simulations  on  the  sre  SH3  domain  exhibit  a 
hierarchy  of  folding  similar  to  that  observed  in  the  experi¬ 
ments.  The  similarity  in  folding  mechanism  of  different 
SH3  domains  and  the  similar  hierarchy  of  structure  forma¬ 
tion  observed  in  the  experiments  and  the  simulations  can 
be  largely  accounted  for  by  a  simple  native  state  topology- 
based  model  of  protein  folding  energy  landscapes. 

Three  independent  lines  of  investigation  suggest  that  pro¬ 
tein  folding  rates  and  mechanisms  are  largely  determined  by 
native  state  topology^  First,  dramatic  changes  in  amino  acid 
sequence,  produced  either  in  the  laboratory-*^  or  by  the  evolu¬ 
tionary  process^  that  do  not  alter  the  overall  topology  of  a 
protein  usually  have  relatively  little  effect  on  protein  folding 
rates.  Second,  comparison  of  the  consequences  of  mutations 
on  folding  kinetics  in  distantly  related  homologs  suggests  that 
folding  transition  state  structure  is  conserved  despite  differ¬ 
ences  in  amino  acid  sequence  and  stability^’^.  Third,  the  fold¬ 
ing  rates  of  small  proteins  are  strongly  correlated  with  a 
property  of  the  native  state  topology:  the  average  sequence 


separation  between  residues  that  make  contacts  in  the  three- 
dimensional  structure  (the  contact  order)^  The  influence  of 
native  state  topology  on  protein  folding  rates  and  mechanisms 
is  a  consequence  of  the  relatively  large  entropic  cost  of  forming 
nonlocal  interactions  early  in  folding:  simple  topologies  with 
mostly  local  interactions  are  more  readily  formed  than  those 
with  many  nonlocal  interactions,  and  for  a  given  topology, 
local  interactions  are  more  likely  to  be  formed  early.in  folding 
than  nonlocal  interactions. 

SH3  domains  are  an  ideal  system  to  investigate  how  topolo¬ 
gy  determines  folding  mechanisms.  Over  400  different  natu¬ 
rally  occurring  SH3  domain  sequences  have  been  identified, 
more  than  10  high-resolution  structures  have  been  deter¬ 
mined,  and  the  stability  and  folding  kinetics  of  a  number  of 
these  proteins  have  been  characterized^*^^"^.  We  had  found  that 
many  of  the  residues  conserved  in  a  phage  display  selection  for 
simplified  sre  SH3  domain  variants^  played  an  important  role 
in  determining  the  folding  mechanism^.  Kinetic  analysis  of 
mutations  in  20  of  the  57  positions  in  the  protein  suggested 
that  the  distribution  of  structure  in  the  transition  state  ensem¬ 
ble  was  localized  to  one  portion  of  the  molecule,  and  that  the 
folding  transition  state  of  the  sre  SH3  domain  resembled  that 
of  the  a-spectrin  SH3  domain,  which  has  an  almost  identical 
topology  but  only  36%  sequence  identity^.  Though  suggestive, 
with  less  than  half  of  the  residues  accounted  for,  these  results 
did  not  thoroughly  characterize  the  transition  state  ensemble 
of  the  protein,  and  did  not  provide  an  explanation  for  the  sim¬ 
ilarity  in  the  sre  and  spectrin  SH3  transition  states. 

In  this  paper  we  present  a  combination  of  experiments, 
computer  simulations,  and  simple  model  calculations  aimed 
at  detailed  characterization  of  the  sre  SH3  transition  state  and 
its  structural  origins.  The  experiments  fully  map  out  the  tran¬ 
sition  state  ensemble  by  probing  the  kinetic  consequences  of 
mutations  of  every  residue  that  makes  appreciable  interactions 
in  the  native  state.  The  computer  simulation  studies  assess  the 
robustness  of  the  hierarchy  of  structure  formation  to  the 
numerous  approximations  and  likely  inaccuracies  in  computa¬ 
tional  models  of  folding.  Finally,  the  simple  model  calcula¬ 
tions  probe  the  topological  features  that  determine  the  way 
SH3  domains  fold.  Our  results  provide  perhaps  the  most  com¬ 
prehensive  picture  of  the  rate-limiting  step  in  folding  of  an  all 
p-sheet  protein  available  to  date. 

Experimental  studies 

The  SH3  domain  is  a  57-residue  globular  protein  that  consists 
of  two  antiparallel  P-sheets  orthogonally  packed  to  form  a  sin¬ 
gle  hydrophobic  core  (Fig.  1).  Here  we  describe  the  effects  of 
mutations  of  all  residues  more  than  10%  buried  in  the  native 
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structure  (52  of  57  residues  in  the  protein)  on  the  rates  of  fold¬ 
ing  and  unfolding,  and  the  picture  of  the  folding  transition 
state  that  emerges  from  these  data. 

The  method  we  employ  was  pioneered  by  Fersht  and 
coworkers'5  and  has  emerged  as  the  predominant  experimen¬ 
tal  procedure  for  the  detailed  characterization  of  folding  tran¬ 
sition  sUtes‘^-“^  The  extent  to  which  a  residue’s  interactions 
are  formed  in  the  transition  state  is  summarized  by  the  Of 
value  (AAG,.$/AAGu.f)  ,  which  is  the  change  in  the  free  ener¬ 
gy  of  the  transition  state  brought  about  by  mutation  of  the 
residue  normalized  by  the  change  in  overall  stability^^  A  Of 
value  of  1  indicates  that  all  of  a  residue’s  interactions  are 
formed  in  the  transition  state,  whereas  a  Of  value  of  0  means 
that  the  residue  does  not  make  stabilizing  interactions  in  the 
transition  state.  Intermediate  Of  values  indicate  partially 
formed  interactions  or  interactions  formed  in  a  fraction  of  the 
transition  state  ensemble;  the  relationship  between  the  actual 
Of  value  and  the  extent  of  structure  formation  is  not  necessar¬ 
ily  linear.  As  emphasized  by  Fersht  and  coworkers^^,  the  most 
straightforward  class  of  mutations  to  interpret  are  those  that 
remove  a  small  number  of  methyl  groups,  such  as  isoleucine  to 
valine,  alanine  to  glycine,  and  valine  to  alanine,  as  these  are 
least  likely  to  change  the  folding  mechanism  and  the  structure 
of  the  folded  and  unfolded  states.  In  this  study,  we  have  also 
mutated  polar  residues  to  alanine  to  examine  the  role  of  polar 
interactions  and  hydrogen  bonds  in  the  transition  state,  and 
have  substituted  glycine  residues  with  alanine  to  probe  turn 
formation  in  the  transition  state.  To  guard  against  possible 
artifacts  due  to  changes  in  denatured  state  structure  and/or 
folding  mechanism,  we  draw  conclusions  only  from  results 
that  are  consistent  among  a  number  of  neighboring  residues. 

To  facilitate  presentation  of  our  results,  we  fiave  divided  the 
src  SH3  domain  into  five  structural  regions  and  discuss  them 
in  order  of  increasing  importance  in  the  folding  transition 
state. 

AT-  and  C-terminal  strands  (strands  1  and  5)  and  3io-nehx. 
The  N-  and  C-termini  of  the  SFI3  domain  come  together  to 
form  an  antiparallel  (i-sheet  stabilized  by  nonlocal  side 
chain-side  chain  interactions  (Fig.  1  and  Table  1),  A  short 
3io-helix  (PSNY,  residues  57-60)  precedes  strand  5  and  is 
responsible  for  the  90°  transition  from  one  sheet  to  the  other. 
It  is  remarkable  that  almost  all  mutations  in  this  region  (12  out 
of  14)  either  exclusively  affect  the  unfolding  rate  or  do  not 
change  protein  stability  (Fig.  2a, h). 

The  extremely  low  Of  values  (Fig.  1,  Table  2)  suggest  that  the 
N-  and  C-termini  are  largely  unstructured  in  the  transition 
state  ensemble. 

RT  loop.  Residues  14-25  (YDYESRTETDLS)  form  the  large, 
relatively  disordered  RT  loop  (Fig.  1),  which  is  functionally 
important  for  binding  proline-rich  peptides.  The  crystal  struc¬ 
ture  of  the  src  SF13  domain  reveals  a  small  stretch  of  regular 
p-sheet  pairing  within  the  loop,  as  well  as  quite  a  few  intraloop 
hydrogen  bonds  involving  the  side  chains  of  D 15,  SI 8,  D23  and 
S25  (Table  1).  Hydrogen/deuterium  (HD)  exchange  experi¬ 
ments®  indicate,  however,  that  this  part  of  the  molecule  is  flexi¬ 
ble.  As  with  the  N-  and  C-termini,  almost  all  mutations  (eight 
out  of  nine)  in  the  RT  loop  have  Of  values  close  to  0  (Fig.  2c, 
Table  2).  L24A  is  the  only  mutation  in  the  RT  loop  that  lowers 
kf,  but  its  predominant  efect  is  still  on  k^.  The  RT  loop  and  the 
N-  and  C-termini  are  clearly  the  parts  of  the  SH3  domain  that 
are  least  structured  in  the  transition  state  (Fig.  1). 

Diverging  type  II  p-turn.  The  transition  from  the  RT  loop  to 
the  central  three-stranded  sheet  formed  by  the  n-src  loop  and 


Fig.  1  Structure^^  of  the  src  SH3  domain  colored  by  Of  value  from  red  (1)  to 
blue  (0).  Residues  colored  in  white  were  either  not  mutated  or  the  muta¬ 
tion  did  not  affect  AG  significantly.  Residues  colored  in  yellow  increased  or 
decreased  both  kf  and  suggesting  that  these  mutations  affect  the  transi¬ 
tion  state  more  than  the  native  state.  Of  values  were  calculated  as  described 
in  the  Methods.  The  image  was  created  using  Molscript^s. 


the  distal  loop  P-hairpin  is  made  by  the  diverging  turn 
(FKKGERLQ,  residues  26-33)  (Fig.  1).  It  is  stabilized  by 
hydrophobic  contacts  between  the  central  core  residues,  F26 
and  L32,  and  a  hydrogen  bond  between  the  side  chain  carboxyl 
of  E30  and  the  backbone  amine  of  K27 . 

Ail  of  the  structurally  important  residues  in  this  region  have 
intermediate  Of  values  (Fig.  2e,  Table  2).  NMR  studies^^  of  an 
isolated  peptide  with  the  sequence  FKKGERL  suggest  that  the 
diverging  turn  conformation  is  partially  populated  in  the 
denatured  state.  Thus,  the  interactions  made  by  the  diverging 
turn  residues  in  the  transition  state  may  be  greater  than  indi¬ 
cated  by  the  Of  values,  since  the  reference  state  (the  denatured 
state)  is  already  partially  ordered.  Recent  double-mutant 
experiments  (V.G.  and  D.B.,  unpublished  results)  suggest  that 
the  additional  interactions  made  by  the  diverging  turn  in  the 
transition  state  include  a  nonlocal  hydrogen  bond  network 
involving  E30  in  the  diverging  turn  and  S47  and  T50  in  the  dis¬ 
tal  p-hairpin.  The  partial  Of  values  of  the  core  residues  F26 
and  L32  suggest  that  these  residues  also  make  some  of  their 
interactions  with  hydrophobic  residues  in  the  distal  loop 
P-hairpin  in  the  transition  state. 

N-src  loop.  The  n-src  loop  (IVNNTEGDWW,_  residues 
34-43)  (Fig.  1)  has  an  unusual  shape:  the  two  end  residues,  134 
and  W43,  are  part  of  the  hydrophobic  core  whereas  the  inter¬ 
vening  sequence  forms  a  large,  almost  rectangular  turn  around 
W43.  W42  is  only  peripherally  associated  with  the  core  and 
together  with  W43  lines  the  peptide  binding  site.  There  is  lim¬ 
ited  local  hydrogen  bonding  within  the  n-src  loop,  and  two 
nonlocal  hydrogen  bonds  connect  it  to  the  3io'helix  (Table  1). 

The  large  number  of  mutations  with  unusual  kinetic  conse¬ 
quences  suggests  that  this  region  may  adopt  nonnative  confor¬ 
mations  in  the  transition  state  (Fig.  2d,  Table  2).  134  is  a 
central  hydrophobic  core  residue  with  many  neighbors  (Table 
1),  yet  neither  I34A  nor  I34V  affect  stability  significantly  (AAG 
-0.33  and  -0.09  kcal  mok^  respectively).  Kinetic  analysis  shows 
that  the  two  134  mutants  slow  both  the  folding  and  unfolding 
rates  simultaneously,  suggesting  that  the  mutations  destabilize 
the  transition  state  more  than  the  native  or  denatured  states. 
134  appears  to  be  critical  for  core  formation  during  folding, 
but  strained  in  the  native  state  because  of  slight  overpacking  of 
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Fig.  2  Dependence  of  the  rate  of  folding  and  unfolding  on  the  denaturant  concentration  for  ali  the  mutants  grouped  into 

as  shown  in  Fig.  1 .  /,  Serine  td  alanine  substitutions  with  unusual  behavior.  The  data  for  the  wild  type  (wt)  protein  (■)  is  shown  in  all  panels  for  com- 


_  .1  Fig.  . 

parison.  The  solid  lines  represent  the  fits  to  the  experimental  data. 


the  hydrophobic  core  in  the  native  src  SH3  domain. 
Overpacking  is  most  likdy  due  to  burial  of  the  bulky  W43  in 
the  native  state,  but  not  in  the  transition  state  (Of  =0.15  for 
the  W43A  mutation).  On  the  solvent-exposed  side,  V35A  is 
the  only  mutation  for  which  an  unambiguous  Of  value  can  be 
calculated  (0.77);  its  interaction  with  L44  in  the  distal  P-hair- 
pin  appears  to  be  partially  formed  in  the  transition  state.  The 
N37  side  chain  appears  to  make  unfavorable  interactions  in 
the  transition  state  as  the  N37A  mutation  speeds  both  folding 
and  unfolding.  Chain  reversal  at  the  tip  of  the  n-src  loop 
appears  to  be  important  at  the  transition  state  as  the  G40A 
mutation,  which  stiffens  the  chain,  slows  both  kf  and  k^.  As 
mutations  in  134,  N37  and  G40  appear  to  selectively  stabilize 
or  destabilize  the  transition  state  relative  to  the  native  and 
unfolded  states,  it  seems  likely  that  the  residues  in  the  n-src 
loop  are  ordered  at  the  transition  state,  but  in  a  nonnative  con¬ 
formation  (perhaps  a  tight  hairpin,  rather  than  the  distorted 
loop  present  in  the  native  state). 

Distal  P-hairpin.  Strands  3  (LAHS,  residues  44-47)  and  4 
(RTGYI,  residues  52-56)  form  the  distal  P-hairpin,  the  most 
regular  element  of  secondary  structure  in  the  SH3  domain 
(Fig.  1).  They  are  connected  by  a  tight  type  I  P-turn  and  stabi¬ 
lized  by  numerous  backbone  and  side  chain  hydrogen  bonds, 
including  an  extensive  network  of  hydrogen  bonds  among  the 


turn  residues  S47  and  T50  and  the  peptide  backbone  (Table  1). 

Mutations  throughout  the  distal  p-hairpin  can  be  grouped 
into  three  categories  based  on  their  effects  on  kinetics  (Table 
2).  Mutations  with  Of  values  of  0  (Fig.  2f)  include  H46A  and 
Q52A  (both  exposed  polar  residues)  and  G54A.  Among  the 
residues  with  intermediate  Of  values  (Fig.  2^,  Table  2)  L44,  T53 
and  Y55  interact  at  the  solvent-exposed  side  of  the  hairpin  and 
appear  to  be  only  partially  associated  at  the  transition  state. 
156,  on  the  other  hand,  is  an  integral  part  of  the  hydrophobic 
core  and  intimately  involved  in  the  transition  state  as  judged 
by  the  large  decrease  in  kf  upon  mutation  to  alanine;  it  does, 
however,  make  additional  interactions  after  the  transition  state 
as  well  (Of  =  0.71).  Mutations  with  Of  values  of  1  are  clustered 
around  the  turn  (S47A,  L48A,  T50A  and  G5iA)  or  are  part  of 
the  hydrophobic  core  (A45G)  (Fig.  2/i),  suggesting  that  the 
p-turn  is  fully  formed  in  the  transition  state  and  the  center  of 
the  p-hairpin  is  associated  with  the  hydrophobic  core.  As 
mentioned  earlier,  residues  S47  and  T50  also  make  nonlocal 
interactions  with  the  diverging  turn.  S49  (Fig.  2i)  might  also 
take  part  in  this  hydrogen  bond  network  at  the  transition  state 
as  the  S49A  mutation  decreases  kf;  the  decrease  in  ku  brought 
about  by  the  mutation  may  be  due  to  partial  burial  of  the  -OH 
group  in  the  native  state  without  a  suitable  hydrogen  bonding 
partner. 
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Fig.  3  Theoretical  analysis  of  SH3  folding,  a,  Ab  initio  simula¬ 
tion  of  src  SH3  folding  using  ROSETTA.  The  folding  of  the  src 
SH3  domain  was  simulated  using  ROSETTA  as  described  in 
Methods.  All  SH3  domain  structures  were  removed  from  the 
data  base  of  short  fragments  used  for  building  up  conforma¬ 
tions.  A  total  of  500  independent  simulations  were  carried  out, 
and  all  conformations  from  the  20  trajectories  that  produced 
structures  within  4.5  A  r.m.s.d.  of  the  native  structure  were 
combined  to  calculate  the  frequency  of  side  chain-side  chain 
contacts  for  each  pair  of  residues  in  the  protein  (lower  right  tri¬ 
angle;  color  scheme  is  shown  below  the  figure).  For  compari¬ 
son.  the  contact  distribution  in  the  native  structure  is  shown  in 
the  upper  left,  b,  Hierarchy  of  SH3  domain  folding  in  model  cal¬ 
culations  based  on  native  state  topology.  Calculations  were  per¬ 
formed  on  the  src,  spectrin  and  fyn  SH3  domains  and  the  47-48 
circular  permutant  of  the  spectrin  SH3  domain  in  which  the  dis¬ 
tal  hairpin  has  been  cut.  The  reaction  coordinate,  Nf,  is  the  .frac¬ 
tion  of  ordered  residues  (Nf  =  0  is  the  fully  unfolded  state  and 
Nf  =  1  is  the  fully  folded  state).  The  y-axis  indicates  position 
along  the  sequence.  All  configurations  of  the  system  were  enu¬ 
merated,  and  the  Boltzmann  averaged  frequency  of  ordering 
of  each  residue,  as  a  function  of  Nf,  is  indicated  by  the  color 
(black-blue,  0-0.25;  blue-magenta,  0.25-0.50;  magenta-red, 
0.50-0.75;  red-yellow,  0.75-0.88;  and  yellow-white,  0.88-1.00). 
The  top  panel  was  shown  in  Aim  and  Baker^T  It  is  important  to 
note  that  segments  of  the  protein  not  contiguous  along  the 
sequence  still  interact  In  the  model  if  contacting  in  the  three- 
dimensional  structure,  for  example  in  the  top  panel,  the  high 
population  of  the  diverging  turn/strand  2  and  the  distal  loop 
p-hairpin  at  Nf=  0.6  indicates  that  more  surface  area  is  buried 
within  and  between  these  structural  elements  than  within  any 
other  substructure  with  the  same  number  of  residues  ordered 
in  the  protein. 


Putting  the  pieces  together,  the  following  picture  of 
the  transition  state  of  the  src  SH3  domain  emerges 
(Fig.  1).  The  distal  P-hairpin  is  the  most  ordered 
structural  element  in  the  transition  state.  TTie  diverg¬ 
ing  turn  and  strand  2  are  partially  ordered  and  interact 
with  residues  in  the  distal  p-hairpin,  and  this  effective¬ 
ly  constrains  the  n-src  loop  and  specifies  the  three- 
stranded  topology  of  the  central  p-sheet  in  the  protein. 
The  clustering  of  mutations  that  selectively  stabilize  or 
destabilize  the  transition  state  in  the  vicinity  of  the 
n-src  loop  (Fig.  1,  yellow)  suggests  that  the  loop  may 
have  a  nonnative  configuration  in  the  transition  state. 
In  contrast,  the  two  terminal  strands,  the  RT  loop  and 
the  3io'helix  are  mostly  unstructured  and  contribute 
few  stabilizing  interactions  in  the  transition  state. 

Because  of  the  complexities  associated  with  inter¬ 
preting  any  one  mutation,  consistency  within  a  large 
set  of  mutations  is  critical  for  constructing  a  plausible 
picture  of  structure  in  the  transition  state.  The  seg¬ 
ment  of  sequence  between  residues  26  and  58  contains 
only  28  of  the  43  positions  for  which  mutations  signif¬ 
icantly  affected  the  rate  of  folding  and/or  unfolding, 
but  this  segment  contains  25  of  the  27  positions  with 
either  (i)  Of  values  greater  than  0.15  or  (ii)  mutations 
that  selectively  stabilize  or  destabilize  the  transition 
state  (Table  2).  The  probability  of  such  a  partitioning  if 
the  observed  Of  values  were  randomly  distributed  in 
the  sequence  is  1  in  758,000. 

Computer  simulations 

To  further  elucidate  the  hierarchy  of  structure  forma¬ 
tion  in  the  SH3  domain,  we  compare  our  experimental 
findings  with  the  results  from  two  complementary 
computational  methods:  recently  published  molecular 
dynamics  (MD)  simulations  of  src  SH3  domain 
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Table  1  Interactions  in  the  native  state  of  src  SH3  domain^ _ 

ss  H  bonds'* 
33 


Residue 

Burial  % 

Hydrophobic  interactions^  sm 

H  bonds3 

T9 

13 

11,31,48,  64 

pi 

F10 

76 

32,  34,  37,  43,61,63 

V11 

64 

9,  13,31.64 

A12 

100 

26,  30.  32.  61 

LI  3 

54 

11,14,  62 

TTT 

44 

13,  28.  60 

D15 

48 

17.  28 

28 

Y16 

85 

19,  23,  26.  42,  56,  57,  60 

E17 

25 

15 

RT 

S18 

87 

23 

25 

loop 

R19 

8 

16,  20.  42 

23 

T20 

39 

19,  23 

22,  23 

E21 

7 

22 

T22 

26 

21.  55 

20 

D23 

79 

16,18,  20,  42,  55 

19,  20 

L24 

69 

25.  26.  32,  45.  47,  52,  56 

S25 

51 

24 

18 

F26 

90 

12,16.  24,30,  32.  56,  57,  60,61 

K27 

34 

30,  50 

30 

diverging 

K28 

30 

14.  15 

15 

turn 

G29 

22 

E30 

53 

12,  26.  27,32,  47,  49,  50 

27 

21 

9.  11,  64 

P2 

L32 

94 

10,  12,  24,  26,  30,  34,  45,  47,  56,  61 

Q33 

46 

35,  46,  48 

77 

10,  32,  37,  43.45,  56,61 

V35 

51 

33,  36 

N36 

28 

35,  38 

n-src 

N37 

53 

10,  34.  43 

loop 

T38 

7 

36,  43 

43 

E39 

21 

58 

G40 

33 

D41 

43 

42 

W42 

50 

16,19,  23,41,55,  57 

W43 

92 

10,  34,  37,  38.  56,  58,  61 

38 

P3 

L44 

83 

53,  55 

A45 

99 

24,  32.  34,  56 

H46 

68 

33,  48,  53 

S47. 

79 

24,  30,  32,  52 

49,  50 

L48 

36 

9.  33,  46 

distal 

S49 

12 

30,  50 

47 

T50 

21 

27,  30,  49,  52 

47 

hairpin 

G51 

54 

Q52 

44 

24,  47,  50 

"T5T 

53 

44.  46 

p4 

G54 

94 

Y55 

62 

22,  23.  42,  44 

156 

99 

16,  24,  26,  32,  34.  43.  45,  57,  61 

P57 

96 

16,  26,  42,  56,  60 

3io 

helix 

S58 

87 

43 

40 

N59 

16 

60* 

Y60 

63 

14,16,26,  57,  59,61 

VST 

95 

10, 12,  26,  32,  34,  43,  56,  60 

Q5 

A62 

77 

13,  63 

P63 

34 

10,  62 

S64 

0 

9, 11,31,65 

23 


16 


47 


30 


’The  crystal  structure  of  the  src  tyrosine  kinase^^  was  used  as  a  model  for  the  native  state  of 
the  SH3  domain  (Ifmk). 

^Hydrophobic  interartions  were  defined  using  the  Voronoi  procedure”. 

^Main  chain-side  chain  hydrogen  bonding. 

^Side  chain-*side  chain  hydrogen  bonding. 


unfolding^-,  and  folding  simulations  of 
the  src  SH3  domain  using  our  ab  initio 
folding  method,  ROSETTA,  which 
recently  showed  considerable  promise  in 
structure  prediction  in  the  CASP3  experi- 
ment^^  (folding  of  the  SH3  domain  start¬ 
ing  with  an  unfolded  polypeptide  is 
computationally  prohibitive  using  MD; 
ROSETTA  achieves  the  vast  speed-up 
necessary  by  simplifying  both  the  confor¬ 
mational  search  strategy  and  the  potential 
function).  The  chain  representations, 
potentials  and  conformational  sampling 
methods  used  by  the  two  approaches  are 
radically  different;  any  common  features 
observed  in  the  two  simulations  are  thus 
likely  to  reflect  properties  of  the  overall 
fold  rather  than  specific  residue-residue 
interactions. 

ROSETTA  uses  local  structural  infor¬ 
mation  from  the  protein  data  base  and  a 
simplified  potential  function  to  fold 
amino  acid  sequences  to  compact  protein¬ 
like  structures^^"^^  (see  Methods).  Even  for 
a  small  protein  such  as  the  SH3  domain, 
only  a  fraction  of  ROSETTA  trajectories 
pass  through  native-like  conformations. 
Inspection  of  individual  successful  trajec¬ 
tories  suggested  that  the  order  of  events  in 
folding  was  quite  similar  to  that  observed 
experimentally.  To  get  a  more  quantitative 
picture  of  the  conformations  sampled,  we 
identified  the  substructures  populated 
most  frequently  in  20  successful  folding 
trajectories  that  produced  structures 
within  4.5  A  r.m.s.d.  (on  Coc)  of  the  native 
state.  The  occupancy  of  all  side  chain-side 
chain  contacts  (both  native  and  nonna¬ 
tive)  was  averaged  over  all  conformations 
in  the  20  trajectories  (Fig.  3^l). 
Interactions  formed  early  in  the  trajectory 
and  persisting  throughout  have  high 
occupancy  in  Fig.  3fl,  whereas  contacts 
formed  late  have  low  occupancy.  Thus, 
while  this  analysis  does  not  single  out  the 
transition  state  ensemble  (this  could 
potentially  be  done  using  the  pfow 
method^^,  but  would  be  extremely  compu¬ 
tationally  expensive),  it  provides  informa¬ 
tion  about  the  overall  hierarchy  to  folding 
in  the  simulations.  As  is  evident  in  Fig.  3fl, 
the  distal  p-hairpin,  the  n-src  loop  and  the 
diverging  turn  are  highly  populated  dur¬ 
ing  the  simulations,  while  the  RT  loop  and 
the  sheet  formed  by  the  N-  and  C-termi- 
nal  strands  are  very  rarely  populated,  sug¬ 
gesting  that  they  are  the  last  elements  to 
be  structured  in  the  protein  (the  contact 
map  for  the  native  protein  is  shown  above 
.  the  diagonal  for  comparison). 
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Table  2  Kinetic  parameters  of  src  SH3  folding^ 


- 

Mutant 

WT 

ln{kf) 

3.55  +  0.03 

ln(M 

1.07  ±0.04 

mf 

1.02±0.019 

niy 

0.54±0.013 

AAGu 

NA 

Of 

NA 

T9A 

3.67  +  0.03 

2.35  +  0.04 

1.01  ±0.037 

0.46  +  0.017 

-0.64  ±0.08 

-0.11  ±0.04 

F10A 

3.41  +0.03 

2.39  +  0.03 

1.06  ±0.029 

0.51  ±0.012 

-0.84  ±0.07 

0.10  ±0.03 

FIOI 

3.67+0.04 

4.06  ±0.03 

1.08±0.11 

0.44  ±0.091 

-1.65±0.17 

-0.05  ±0.02 

VI 1A 

3.45  +  0.04 

3.79  ±0.03 

0.85  ±0.057 

0.55  ±0.033 

-1.64±0.12 

0.03  ±0.02 

“ 

A12G 

3.46  +  0.03 

2.73  +  0.05 

1,08  +  0-049 

0.47  ±0.026 

-1.00  ±0.09 

0.05  ±0.02 

L13A 

3.62  +  0.02 

3.87  ±0.06 

1.20  ±0.098 

0.36  ±0.02 

-1.49±0.13 

-0.03  ±0.01 

Y14A 

3.59  +  0.03 

1.68  ±0.03 

0.96  ±0.042 

0.50  ±0.011 

-0.31  ±0.06 

-0.08  ±  0.09 

D15A 

3.70  +  0.03 

2.02  ±0.08 

0.98  ±0.051 

0.41  ±0.03 

-0.43  ±0.13 

-0.22±0.10 

Y16A 

3.40  ±0.07 

4.94  ±0.06 

_2 

0.39  ±0.018 

-2.27  ±0.26 

0.03  ±0.03 

RT 

Y16F 

3.54  +  0.04 

1.37  ±0.02 

0.97  ±0.036 

0.65  ±0.093 

-0.18±0.10 

-3 

loop 

S18A 

3.80  ±0.04 

0.47  ±0.06 

0.96  ±0.028 

0.55  ±0.051 

0.52  ±0.10 

0.28  ±0.06 

R19A 

3.64  ±  0.04 

1.22  ±0.07 

1.08  ±0.034 

0.61  ±0.017 

-0.07  ±  0.08 

-3 

T20A 

3.69  ±0.03 

1.10±0.04 

1.01  ±0.033 

0.52±0.017 

0.06  ±0.07 

-3 

T22A 

3.60  +  0.03 

1.14  +  0.03 

1 .00  ±  0.03 

0.52  ±0.016 

-0.01  ±0.07 

-3 

D23A 

3.43  ±0.05 

1.91  +0.07 

1.06  ±0.083 

0.51  ±0.043 

-0.56  ±0.13 

0.13  ±0.07 

L24A 

2.76  +  0.03 

3.44  ±0.03 

1.45  ±0.06 

0.45  ±0.0089 

-1.79  ±0.09 

0.26  ±0.01 

S25A 

3.49  +  0.04 

2.40  ±0.03 

0.95  ±0.041 

0.58  ±0.04 

-0.82  ±0.08 

0.03  ±0.04 

F26A 

2.17  +  0.03 

3.21  ±0.04 

-2 

0.40  ±  0.0096 

-1.97±0.10 

0.40  ±0.01 

K27A 

3.60  +  0.04 

1.79  ±0.05 

1.01  ±0.051 

0.65  ±0.045 

-0.44  ±0.11 

-0.06  ±0.09 

K28A 

3.73  +  0.03 

1.45  ±0.04 

0.88  ±0.017 

0.49  ±0.022 

-0,09  ±0.07 

-3 

diverging 

G29A 

2.30  +  0.04 

2.74  ±0.03 

1.54  ±0.088 

0.43  ±0.017 

-1.66±0.12 

0.44  ±0.02 

turn 

E30A 

1.50  ±0.03 

2.34  ±0.03 

1.09  ±0.054 

0.65  ±0.027 

-1,94±0.13 

0.62  ±0.02 

R3Ta 

3.43  +  0.03 

1.45±0.10 

1.00  ±0.06 

0.57  ±0.035 

-0.32  ±0.12 

0.23  ±0.08 

L32A 

1.40  +  0.08 

2.61  ±0.11 

_2 

_2 

-2.26  +  0.37 

0.55  ±0.05 

L32y 

3.10±0.04 

2.68  ±0.03 

1.03  +  0.05 

0.56  ±0.031 

0.55  ±0.027 

-1.21  ±0.11 
-0.21  ±0.09 

0.22  ±0.02 

Q33A 

3.17  +  0.04 

1.00  ±0.04 

1.11  ±0.033 

I34A 

1.40  ±0.04 

-0.63  ±0.03 

1.39  ±0.027 

0.51  ±0.03 

-0.32  ±0.12 

134V 

3.09  ±0.05 

0.75  ±0.09 

1.16±0.093 

0.50  ±0.019 

-0.09  ±0.12 

^4 

n-src 

V35A 

2.6  ±0.06 

1.33  ±0.05 

1.27  ±0.046 

0.59  ±0.041 

-0.77  ±0.12 

0.77  ±0.05 

loop 

N36A 

3.64  ±0.05 

1.50  ±0.05 

1.12  +  0.035 

0.47  ±0.017 

-0.20  ±0.09 

_3 

N37A 

3.91  ±0.03 

1.30  ±0.04; 

0.94  ±0.032 

0.59  ±0.023 

0.07  ±0.06 

_4 

G40A 

2.99  ±0.05 

1.02  ±0.06 

0.92  ±0.037 

0.55  ±0.036 

-0.28  ±  0.08 

.—4 

W42A 

3.03  ±0.05 

2.83  ±0.02 

1.62  ±0.048 

0.45  ±  0.0084 

-1.29±0.10 

0.25  ±0.03 

W43A 

3.24  ±0.03 

2.97  ±  0.06 

1.16±0.069 

0.35  ±0.025 

-1.20  ±0.11 

0.1 5  ±0.02 

W431 

4.45  ±  0.05 

3.30  ±0.03 

1.10±0.081 

0.62  ±0.073 

-0.77  ±0.13 

^4 

P3 

L44A 

2.10±0.05 

2.52  ±0.05 

1.88  ±0.065 

0.41  ±0.0095 

-1.64  ±0.15 

0.54  ±0.03 

A45G 

1.71  ±0.07 

0.86  ±0.06 

1 .70  ±  0.044 

0.39  ±0.011 

-0.92  ±0.1 5 

1.20  ±0.08 

H46A 

3.47  ±  0.03 

2.00  ±0.02 

0.99  ±0.029 

0.58  ±0.016 

‘  -0.62  ±  0.06 

0.08  ±0.04 

S47A 

1.23  ±0.04 

1.29  ±0.03 

1.50  ±0.035 

0.44  ±  0.0074 

-1.46  ±0.09 

0.95  ±0.03 

L48A 

2.82  ±  0.04 

1.38  ±0.03 

1.20  ±0.059 

0.51  ±0.019 

-0.61  ±0.08 

0.72  ±0.04 

distal 

S49A 

2.98  ±0.06 

0.07  ±  0.06 

1.20  ±0.061 

0.61  ±0.026 

0.18±0.11 

'  _4 

hairpin 

T50A 

0.99  ±0.05 

1.59  ±0.02 

1.84  ±0.054 

0.47  ±0.012 

-1.79  ±0.10 

0.86  ±0.02 

G51A 

1.39  ±0.04 

1.02  ±0.06 

1.51  ±0.11 

0.41  ±0.019 

-1.21  ±0.14 

1.06  ±0.06 

Q52A 

T53A 

3.29  ±0.03 

1.41  ±0.08 

1.03  ±0.059 

0.49  ±0.021 

-0.35  ±0.12 

0.45  ±0.09 

2.33  ±0.06 

1.65  ±0.04 

1.38  ±0.048 

0.56  ±0.021 

-1.11±0.11 

0,68  ±  0.03 

P4 

G54A 

3.79  ±0.02 

4.59  ±0.05 

1.60±0.17 

0,36  ±0.02 

-1.81  ±0.12 

-0.08  ±0.01 

Y55A 

2.10  ±0.04 

2.34  ±0.04 

1.64  ±0.063 

0.39  ±0.0062 

-1.52±0.10 

0.56  ±0.02 

156A 

1.34  ±0.03 

2.02  ±  0.02 

1.64  ±0.067 

0.46  ±0.016 

-1.84±0.10 

0.71  ±0.02 

P57A 

2.98  ±  0.04 

2.89  ±0.04 

1.36  ±0.098 

0.45  ±0.014 

-1.36  ±0.11 

0.24  ±0.02 

3io 

558A 

4.06  ±0.03 

1.14±0.05 

0.99  ±0.034 

0.58  ±0.023 

0,24  ±0.08 

_4 

helix 

N59A 

3.55  ±0.03 

1.28  ±0.04 

0.87  ±0.041 

0.58±0.016 

-0.14  ±0.07 

Y60A 

3.48  ±  0.05 

0.65  ±0.04 

1.06  ±0.034 

0.47  ±0.034 

0.23  ±0.09 

_3 

V61A 

3.67  ±0.04 

3.29  ±0.02 

1.15  ±0,044 

0.44±0.013 

-1.18±0.09 

-0.06  ±  0.03 

P5 

A62G 

3.59  ±0.04 

1.97  +  0.03 

1.12±0.045 

0.55  ±0.019 

-0.53  ±0.08 

-0.02  ±0.07 

P63A 

3.64  +  0.04 

1.44  ±0.08 

1.04  ±0.062 

0.48±  0.031 

-0.14±0.11 

_3 

564A 

3.66  ±0.03 

0.40  ±0.04 

0.95  ±0.026 

0.59+0.018 

0.44  ±0.06 

0.14  ±0.05 

D65A 

3.69  ±0.03 

0.98  ±0.05 

0.92  ±0.032 

0.57  ±0.02 

0.13  ±0.07 

-3 

’All  experiments  were  done  at  pH  6  and  295  K.  Kinetic  measurements  were  done  by  stopped-flow  fluorescence.  Rate  of  folding  (kf)  is  reported  at  0.3  M  Gnd;  rate 
of  unfolding  (k^)  is  reported  at  4  M  Gnd  to  avoid  extrapolation.  AAG„  Of  and  standard  errors  were  calculated  as  described  in  the  Methods  section. 

^Parameters  could  not  be  reliably  measured. 

^Mutation  has  no  (or  very  small)  effect  on  stability,  that  is,  AAG,,  <  0.20  kcal  mol'’. 

'‘Mutation  either  increases  or  decreases  both  kf  and  ky. 
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High-temperature  MD  unfolding  simulations  have  provided 
insights  into  the  folding  of  a  number  of  small  proteins-^"^”.  Tsai 
et  al}-  carried  out  30  independent  simulations  of  src  SH3 
domain  unfolding,  and  analyzed  the  order  in  which  the  struc¬ 
tural  elements  are  disrupted  in  the  unfolding  process.  Overall, 
the  hierarchy  of  unfolding  was  consistent  with,  but  less  pro¬ 
nounced  than,  the  hierarchy  observed  in  the  experiments  and 
in  the  ab  initio  simulations:  the  interactions  between  the  N- 
and  C-terminal  strands  were  lost  earlier  than  those  within  the 
distal  loop  p-hairpin,  the  n-src  loop,  and  between  the  diverg¬ 
ing  turn  and  the  distal  loop  p-hairpin. 

Although  the  overall  features  of  the  simulations  were  consis¬ 
tent  with  the  experimental  results,  there  also  were  some  incon¬ 
sistencies.  While  residues  in  the  three-stranded  sheet  made 
extensive  contacts  in  the  ab  initio  simulations  and  have  high  Of 
values  in  the  experiments,  the  C-terminus  also  made  nurner- 
ous  contacts  in  the  simulations  (Fig.  3a),  but  contains  mainly 
low  Of  values.  In  the  MD  simulations,  the  RT  loop  remained 
ordered  until  quite  late  in  the  unfolding  process.  These  dis¬ 
crepancies  notwithstanding,  the  overall  concordance  between 
the  simulations  and  the  experiments  is  quite  intriguing  given 
that  the  MD  simulations  were  carried  out  at  498  K  and  the 
ab  initio  simulations  do  not  explicitly  model  side,  chains,  and 
sugests  that  the  hierarchy  to  folding  of  the  SH3  domain  is 
determined  by  fairly  coarse-grained  features  of  the  structure. 

Native  state  topology-based  model  calculations 

To  isolate  those  features  of  the  native  topology  responsible  for 
determining  folding  mechanisms,  we  recently  developed  a 
simple  native  state  topology-based  model  of  the  folding  free 
energy  landscape  and  folding  process^’^h  In  this  model,  the 
folding  landscape  is  approximated  by  considering  only  confor¬ 
mations  in  which  each  residue  is  either  ordered  as  in  the  native 
structure  or  completely  disordered,  and  all  ordered  residues 
occur  in  one  or  two  contiguous  stretches  of  the  protein 
sequence.  The  free  energy  of  each  of  these  conformations  is 
determined  by  the  balance  between  attractive  native  interac¬ 
tions,  taken  to  be  proportional  to  the  surface  area  buried  with¬ 
in  the  ordered  region  in  the  native  structure,  and  the  entropic 
cost  of  chain  ordering,  a  function  of  the  number  of  residues 
ordered  and  the  loop  length  between  the  ordered  segments. 

We  use  this  simple  approach  to  model  the  folding  free  ener¬ 
gy  landscape  of  the  three  SH3  domains  whose  folding  mecha¬ 
nisms  have  been  probed  by  mutation:  the  src  SH3  domain,  the 
fyn  SH3  domain  (A.  Davidson,  pers.  comm.),  and  the  a-spec- 
trin  SH3  domain®-*^.  As  a  control,  the  same  calculations  were 
carried  out  on  the  a-spectrin  SH3  permutant  found  by 
Serrano  and  coworkers^^  to  have  a  significantly  changed  fold¬ 
ing  transition  state.  A  natural  reaction  coordinate  in  this 
model  is  simply  the  fraction  of  residues  ordered  as  in  the 
native  state,  Nf.  To  determine  the  order  in  which  the  different 
parts  of  the  protein  fold  as  Nf  increases,  we  enumerated  all 
configurations  allowed  by  the  model  with  a  particular  value  of 
Nf,  determined  their  free  energies,  and  computed  the 
Boltzmann  weighted  frequency  of  ordering  each  residue^^  (Fig. 
5b).  Close  to  the  unfolded  state  (Nf=  0)  most  residues  have  low 
frequency  of  ordering  (black  color),  while  close  to  the  native 
state  (Nf=  1)  almost  all  residues  are  ordered  (white  color). 
There  are  interesting  similarities  in  the  hierarchies  of  structure 
formation  in  the  three  native  SH3  domains  obtained  with  this 
model  (Fig.  5b).  The  first  regions  of  the  proteins  to  become 
ordered  are  the  three  hairpin  loops  (the  distal  loop  |3-hairpin, 
the  RT  loop  and  the  n-src  loop).  By  Nf  -  0.5,  the  predominant 


region  ordered  in  all  three  proteins  is  the  three-stranded  sheet 
formed  by  the  distal  loop  p-hairpin  and  the  n-src  loop.  The 
decrease  in  the  relative  population  of  the  RT  loop  occurs 
because  ordering  additional  residues  increases  the  entropic 
cost  of  structure  formation  without  significant  increases  in  the 
attractive  native  interactions;  in  contrast,  the  ordering  of  the 
residues  in  the  three-stranded  sheet  formed  by  the  n-src  loop 
and  the  distal  loop  P-hairpin  produces  significant  gains  in 
attractive  interactions  (the  three-stranded  sheet  has  a  much 
higher  density  of  stabilizing  interactions  than  other  portions 
of  the  protein  of  similar  length).  For  all  three  proteins,  the  first 
and  last  strands  become  ordered  very  late  in  the  folding 
process,  consistent  with  the  fact  that  they  are  stabilized  pri¬ 
marily  by  nonlocal  interactions.  The  similarities  of  the  plots 
are  consequences  of  the  similarities  in  the  topology  of  the 
three  proteins.  Notably,  the  hierarchy  of  folding  is  significant¬ 
ly  altered  in  the  SH3  domain  circular  permutant  (Fig.  5b). 

Overall,  the  hierarchy  of  structure  formation  observed  in 
this  simple  model  is  consistent  with  the  experimental  results 
(Fig.  1):  the  residues  with  high  tl>f  values  in  the  src  SH3 
domain  lie  in  the  central  three-stranded  sheet,  and  for  a-spec- 
trin®  and  fyn  (A.  Davidson,  pers.  comm.),  mutations  in  the 
distal  loop  p-hairpin  have  high  Of  values.  The  lack  of  treat¬ 
ment  of  local  sequence  structure  propensities  may  account  for 
the  roughly  equal  tendencies  of  the  n-src  loop  and  the  distal 
loop  p-hairpin  to  form  in  the  model;  the  higher  Of  values  in 
the  distal  loop  p-hairpin  in  src  and  a-spectrin^  may  reflect 
more  complete  ordering  of  this  structure  in  the  transition  state 
due  to  stabilizing  local  interactions  such  as  hydrogen  bonds 
not  included  in  the  model.  It  should  be  emphasized  that  the 
two-segment  model  does  not  simply  identify  the  longest  con¬ 
tiguous  stretch  of  interacting  residues;  for  example,  in  barnase 
the  N-terminal  helix  is  correctly  predicted  to  associate  with 
the  C-terminal  sheet  in  the  transition  state^h 

In  summary,  the  similarity  in  the  hierarchy  of  folding 
observed  experimentally  in  the  src  and  a-spectrin  SH3 
domains,  in  the  ab  initio  and  MD  simulations,  and  in  the  sim¬ 
ple  model  calculations  suggests  that  the  folding  mechanism  of 
SH3  domains  is  largely  determined  by  the  topology  of  the 
native  protein.  The  success  of  the  simple  model  in  reproducing 
the  hierarchy  observed  both  experimentally  and  in  the  simula¬ 
tions  suggests  that  the  folding  process  of  this  protein  is  largely 
determined  by  the  balance  between  the  entropic  cost  of  chain 
ordering  and  the  formation  of  attractive  native  interactions; 
nonnative  interactions  and  conformations  (that  is,  kinetic 
traps)  appear  to  play  a  relatively  minor  role  in  shaping  the 
folding  process.  The  structural  polarization  of  the  SH3 
domain  folding  transition  state  can  be  viewed  as  a  conse¬ 
quence  of  the  low  free  energy  cost  of  ordering  the  low  contact 
order^  central  three-stranded  sheet,  relative  to  the  much  high¬ 
er  contact  order  sheet  formed  by  the  N-  and  C-termini  togeth¬ 
er  with  the  RT  loop.  The  importance  of  the  computational 
work  described  in  this  paper  in  supporting  this  hypothesis 
may  be  seen  by  considering  the  alternative  hypothesis  that 
structural  polarization  in  the  transition  state  ensemble  is  a 
consequence  of  inhomogeneities  in  inter-residue  interaction 
strengths:  the  strongest  interactions  are  the  last  to  break  dur¬ 
ing  unfolding  and  the  most  likely  to  nucleate  the  refolding 
process.  Since  the  distal  loop  P-hairpin  has  the  most  extensive 
intraloop  hydrogen  bonding,  if  only  the  experimental  data 
were  available  it  could  equally  well  be  argued  that  the  origin  of 
structural  polarization  of  the  SH3  transition  state  was  the 
greater  stabilization  of  the  distal  loop  p-hairpin  relative  to  the 
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other  structural  elements  in  the  protein.  The  ab  initio  folding 
simulations,  however,  have  no  prior  knowledge  that  the  inter¬ 
actions  within  the  distal  loop  p-hairpin  are  stronger  than  in 
the  other  loops,  and  the  simple  free  energy  landscape  model 
does  not  consider  hydrogen  bonding  at  all.  Thus,  the  fact  that 
a  similar  hierarchy  to  structure  formation  is  observed  in  the 
calculations  and  experiments  helps  to  distinguish  between  two 
hypotheses  that  are  equally  consistent  with  the  experimental 
data. 

The  accompanying  papers  from  the  Dobson^^  and  Serrano^^ 
groups  strongly  support  the  idea  that  native  state  topology  is  a 
dominant  determinant  of  protein  folding  mechanisms.  Martinez 
and  Serrano^^  show  that  the  folding  transition  state  of  the 
a-spectrin  SH3  domain  is  similar  to  that  of  the  src  SH3  domain 
and  is  not  significantly  altered  by  changes  in  pH  that  produce 
large  changes  in  stability.  Chiti  et  show  that  folding  transi¬ 
tion  state  structure  is  conserved  in  a  second  pair  of  proteins  with 
similar  native  structures  but  with  only  13%  sequence  identity: 
acylphosphatase  and  the  activation  domain  of  procarboxypepti¬ 
dase  2.  Chiti  et  al}'^  also  show  that  the  correlation  between  fold¬ 
ing  rates  and  contact  order  observed  among  two-state  folding 
proteins  generally  also  holds  within  a  set  of  five  nonhomologous 
proteins  that  exhibit  the  AcP  topology. 

The  combination  of  experiment,  simulation  and  theory 
employed  in  this  paper,  together  with  comparisons  of  the  fold¬ 
ing  of  structurally  related  proteins  such  as  those  in  the  accom¬ 
panying  papers,  has  the  potential  to  distinguish  the  robust 
features  of  the  folding  process  from  those  dependent  on  high- 
resolution  detail,  and  to  trace  the  origins  of  these  robust  fea¬ 
tures  to  basic  physical  principles.  We  believe  that  this 
integration  of  complementary  approaches  will  be  critical  for 
obtaining  a  complete  understanding  of  the  folding  process. 

Methods 

Mutagenesis.  Mutagenesis  was  accomplished  using  the  Quick 
Change  site-directed  mutagenesis  kit  (Stratagene).  Plasmids  har¬ 
boring  the  point  mutations  were  transformed  into  BL21  cells,  and 
protein  was  overexpressed  and  purified^.  The  His  tag  was  not 
removed  for  the  purposes  of  this  study.  All  mutants  were 
sequenced  to  ensure  that  the  mutagenesis  was  successful  and  the 
purified  proteins  were  analyzed  by  mass  spectrometry  to  confirm 
that  each  mutation  was  the  expected  one. 

Biophysical  analysis.  In  all  experiments,  protein  solutions  were 
made  in  50  mM  sodium  phosphate,  pH  6,  and  the  temperature 
was  held  constant  at  295  K.  The  stability  of  the  point  mutants  was 
assessed  by  guanidine  (Gnd)  denaturation  using  either  circular 
dichroism  (CD)  or  fluorescence  as  reported^  The  kinetics  of  fold¬ 
ing  and  unfolding  were  followed  by  fluorescence  on  a  Bio-Logic 
SFM-4  stopped-flow  instrument.  The  unfolding  reaction  for  the 
wild  type  protein  was  determined  to  behave  as  a  two-state 
process®,  and  the  kinetic  and  equilibrium  data  for  the  mutants 
were  fit  to  a  two-state  model.  Equilibrium  data  (not  shown)  were 
generally  in  agreement  with  the  kinetic  estimates  of  stability  for 
the  less  destabilized  mutants. 

value  analysis.  Of  values  were  calculated  only  for  mutants 
that  were  destabilized  by  more  than  0.2  kcal  moM  relative  to  the 
wild  type  protein.  In  order  to  avoid  extrapolations,  we  compared 
folding  rates  at  0.3  M  Gnd  and  unfolding  rates  at  4  M  Gnd.  In  cal¬ 
culating  AAG  for  each  mutant  we  assumed  that  it  is  independent 
of  the  denaturant  concentration,  which  is  warranted  since  the  m 
values  for  the  mutants  are  not  very  different  from  wild  type.  AAG 
and  Of  were  computed  using  AAG  =  -RT(!n(ko.3M"^/ko.3M'"‘^0  + 
ln(k4M'""Vk4M'^))  and  Of  =  -RTIn(ko.3M'"Vko.3M"’"')/AAG.  For  error 
analysis,  we  decided  on  a  procedure  that  makes  use  of  the  many 


independent  measurements  in  the  linear  portions  of  the  V  curves 
shown  in  Fig.  2.  The  estimates  and  confidence  regions  for 
ln(ko.3M'"Vko3M'"''^^  and  In  (k4M'"^7k4M'"^)  were  obtained  by  simultane¬ 
ously  fitting  the  linear  portions  of  the  mutant  and  wild  type 
V  curves  to  In  kf(Gnd)'"^  =  In  kf(Gnd)"’«^  -h  5f  and  In  ku(Gnd)'^  = 
In  ku(GND)"''^^  +  Sy.The  error  estimates  for  the  Of  values  presented 
in  Table  2  represent  95%  confidence  intervals  (roughly  twice  the 
standard  deviation)  for  the  Of  value,  generated  by  repeatedly 
(10,000  times)  sampling  from  the  6f  and  5u  distributions  and 
recomputing  the  Of  values  using  Of  =  5f/  (5f  +  SJ  . 

Ab  initio  folding  simulations.  The  ab  initio  folding  method, 
ROSETTA,  utilizes  a  backbone  plus  side  chain  centroid-based  rep¬ 
resentation  of  the  chain;  local  interactions  are  satisfied  by  build¬ 
ing  structures  up  from  short  (three-  and  nine-residue)  segments 
of  known  structures  with  sequences  similar  to  those  of  the 
sequence  being  folded^^  while  the  nonlocal  interactions  that  sta¬ 
bilize  proteins  are  treated  using  a  low-resolution  scoring  function 
with  terms  representing  hydrophobic  burial,  strand  pairing  and 
specific  pair  Interactions  such  as  charge  pairing  and  disulfide 
bonding25.  A  Monte  Carlo  simulated  annealing  strategy  is  used  to 
sample  conformational  space;  a  move  consists  of  a  substitution  of 
a  three-  or  nine-residue  segment  of  the  chain  by  a  randomly  cho¬ 
sen  fragment  from  a  known  structure  with  a  similar  local 
sequence.  The  protocol  used  to  simulate  the  SH3  domain  folding 
here  was  the  same  as  that  used  in  our  CASP3  structure  predic- 
tions23.  All  SH3  domain  structures  were  removed  from  the  data 
base  of  short  fragments  used  for  building  up  conformations. 
Because  of  the  very  large  size  of  the  conformational  space,  the 
trajectories  and  final  structures  for  different  runs  can  vary  consid¬ 
erably.  A  total  of  500  independent  simulations  were  carried  out, 
and  all  conformations  from  the  20  trajectories  that  produced 
structures  within  4.5  A  r.m.s.d.  of  the  native  structure  were  com¬ 
bined.  The  frequency  of  each  contact  (defined  as  a  pair  of  side 
chain  centroids  within  8  A)  in  the  pooled  set  of  conformations 
was  then  computed. 

Simple  model  calculations.  The  folding  free  energy  landscape 
of  the  SH3  domain  was  approximated  using  the  two-segment 
model  described^k  The  free  energy  landscapes  of  the  src,  spectrin 
and  fyn  SH3  domains,  and  the  47-48  circular  permutant  of  the 
spectrin  SH3  domain  were  approximated  by  considering  only  con¬ 
figurations  in  which  (i)  each  residue  is  fully  ordered  as  in  the 
native  state  or  fully  disordered,  and  (ii)  the  ordered  residues 
occur  in  one  or  two  contiguous  stretches  of  the  sequence.  The 
free  energy  of  each  configuration  was  computed  from  the  equa¬ 
tion  F  =  -yAASA  -t*  aRTN  +  pRTln(AL);  all  parameters  were  taken 
from  the  literature  or  from  simple  off-lattice  calculations.  In  the 
first  term,  which  represents  the  favorable  interactions  made  in 
the  partially  ordered  configuration,  AASA  is  the  difference  in 
exposed  surface  area  between  the  partially  ordered  configura¬ 
tion  and  the  unfolded  state  (estimated  from  the  sum  of  the 
native  tripeptide  surface  areas)  and  y=  16  cal  mol*’  A*^.  In  the  sec¬ 
ond  term,  which  represents  the  entropic  cost  of  ordering  each 
residue  in  the  ordered  segments,  N  is  the  number  of  residues 
ordered  and  a=  2.9.  In  the  third  term,  which  represents  the 
entropic  cost  of  closing  the  loop  between  the  two  ordered  seg- 
ments^^  AL  is  the  length  of  the  loop  and  p  =  1 .8. 
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One  of  the  outstanding  questions  in  protein  folding  concerns  the 
degree  of  heterogeneity  in  the  folding  transition  state  ensemble: 
does  a  protein  fold  via  a  large  multitude  of  diverse  "pathways,"  or 
are  the  elements  of  native  structure  assembled  In  a  well  defined 
order?  Herein,  we  build  on  previous  point  mutagenesis  studies  of 
the  src  SH3  by  directly  Investigating  the  association  of  structural 
elements  and  the  loss  of  backbone  conformational  entropy  during 
folding.  Double-mutant  analysis  of  polar  residues  in  the  distal 
/3-hairpin  and  the  diverging  turn  indicates  that  the  hydrogen  bond 
network  between  these  elements  is  largely  formed  in  the  folding 
transition  state.  A  10-glycine  insertion  in  the  n-src  loop  (which 
connects  the  distal  hairpin  and  the  diverging  turn)  and  a  disulfide 
crosslink  at  the  base  of  the  distal  p-hairpin  exclusively  affect  the 
folding  rate,  showing  that  these  structural  elements  are  nearly  as 
ordered  in  the  folding  transition  state  as  in  the  native  state.  In 
contrast,  crosslinking  the  base  of  the  RT  loop  or  the  N  and  C  termini 
dramatically  slows  down  the  unfolding  rate,  suggesting  that  dis¬ 
sociation  of  the  termini  and  opening  of  the  RT  loop  precede  the 
rate-limiting  step  in  unfolding.  Taken  together,  these  results  sug¬ 
gest  that  essentially  all  conformations  in  the  folding  transition 
state  ensemble  have  the  central  three-stranded  /3-sheet  formed, 
indicating  that,  for  the  src  homology  3  domain,  there  Is  a  discrete 
order  to  structure  assembly  during  folding. 

One  of  the  major  differences  between  the  “old”  and  the 
“new”  views  of  protein  folding  is  the  degree  of  heteroge¬ 
neity  in  the  folding  transition  state  ensemble  (1).  In  the  limit  of 
a  single  folding  pathway,  all  folding  trajectories  are  presumed  to 
undergo  similar  conformational  transitions,  whereas  in  the  limit 
of  a  perfectly  symmetric  folding  “funnel,”  a  vast  number  of 
different  trajectories  lead  to  the  native  state.  Studies  of  simple 
lattice  models  have  suggested  both  “single”  and  “multiple” 
folding  nuclei  scenarios  (2,  3).  For  small  proteins  that  fold 
without  detectable  intermediates,  characterization  of  the  folding 
transition  state  by  mutational  analysis  is  perhaps  the  best  avail¬ 
able  approach  to  addressing  this  issue.  This  method,  pioneered 
by  Fersht  and  coworkers  (4,  5),  has  proven  extremely  powerful 
in  providing  site-specific  information  about  structure  at  the 
rate-limiting  step  for  folding  (6-11).  It  probes  the  formation  of 
side  chain-side  chain  interactions  in  the  transition  state  by 
deleting  parts  of  individual  residues  and  assessing  the  effect  on 
folding  kinetics.  There  are,  however,  two  shortcomings  of  this 
approach:  (i)  residues  are  often  involved  in  multiple  interactions, 
and  point  mutagenesis  does  not  distinguish  which  of  these  are 
important  in  the  transition  state;  and  (ii)  the  conformation  of  the 
peptide  backbone  can  be  deduced  only  indirectly.  To  go  beyond 
these  limitations,  in  the  present  study,  we  employ  double-mutant 
analysis  to  probe  side  chain-side  chain  interactions  between 
structural  elements  and  a  10-glycine  insertion  and  disulfide 
crosslinks  to  test  backbone  ordering  and  the  association  of  entire 
structural  elements  at  the  transition  state. 

Previously,  we  studied  the  structure  of  the  transition  state  for 
folding  of  the  57-residue  src  SH3  domain  by  characterizing  the 
kinetic  consequences  of  a  large  number  of  point  mutations  and 
found  that  the  rate-limiting  step  in  folding  involves  formation  of 
the  distal  j3-hairpin  and  the  diverging  turn  (Fig.  L4;  refs.  12  and 


13).  In  this  study,  we  investigate  long-range  order  in  the  transi¬ 
tion  state  by:  (i)  double-mutant  analysis  of  a  hydrogen  bond 
network  between  the  distal  j3-hairpin  and  the  diverging  turn  to 
probe  their  association  in  the  transition  state;  (ii)  a  10-glycine 
insertion  in  the  n-src  loop  to  investigate  its  conformational 
rigidity  and  thus  the  association  of  the  distal  /3-hairpin  and  the 
diverging  turn,  which  the  n-src  loop  connects;  (Hi)  disulfide 
crosslinking  the  distal  ^-hairpin  and  the  RT  loop  to  probe  the 
extent  of  closure  of  the  two  hairpin  loops;  and  (iv)  disulfide 
crosslinking  of  the  N  and  C  termini  to  probe  the  association  of 
the  terminal  strands  in  the  transition  state. 

Methods 

Mutagenesis.  Point  mutagenesis  was  accomplished  with  the 
Quick  Change  Site-Directed  mutagenesis  kit  (Stratagene).  The 
glycine  insertion  mutant  was  constructed  by  using  PCR  cassette 
mutagenesis  with  primers  coding  for  the  10  glycines.  Plasmids 
harboring  the  mutations  were  transformed  into  BL21  cells,  and 
protein  was  overexpressed  and  purified  (12).  The  His  tag  was  not 
removed  for  the  purposes  of  this  study.  All  mutants  were 
sequenced  to  ensure  that  the  mutagenesis  was  successful,  and  the 
purified  proteins  were  analyzed  by  mass  spectrometry  to  confirm 
protein  identity. 

Disulfide  Crosslinking.  Residues  mutated  to  cysteine  were  chosen 
to  satisfy  the  geometric  requirements  for  disulfide  bond  forma¬ 
tion:  Ca-Ca  (4.4-6.8  A),  Cj3-Cj3  (3.5-4.5  A),  and  dihedral  angle 
close  to  90“  (14).  The  residues  chosen  were  previously  deter¬ 
mined  to  have  a  very  small  or  no  effect  on  the  rate  of  folding  and 
stability  (13).  W43C  is  the  only  mutation  that  is  likely  to  affect 
stability  significantly  (as  judged  from  the  W43A  mutation, 
AAG  =  1.2  kcal/mol).  Disulfide  bonds  were  oxidized  in  the 
presence  of  20  mM  K3Fe(CN)6  for  10  min  at  room  temperature. 
Reactions  were  performed  in  the  dark  because  K3Fe(CN)6  is 
light  sensitive.  Disulfide  formation  was  confirmed  with  Ellman’s 
reagent. 

Biophysical  Analysis.  Protein  solutions  (100  /xM)  were  made  in  50 
mM  sodium  phosphate  (pH  6),  and  the  temperature  was  held 
constant  at  295  K.  For  wild  type  (WT)  and  the  S47A  mutant, 
experiments  were  also  performed  in  50  mM  NaPi  (pH  3)  at  295 
K.  To  reduce  the  disulfide-crosslink  mutants,  they  were  incu¬ 
bated  in  10  mM  DTT  for  1  h,  and  the  same  concentration  of 
reducing  agent  was  present  throughout  the  kinetic  experiments. 
The  kinetics  of  folding  and  unfolding  were  followed  by  trypto¬ 
phan  fluorescence  on  a  Bio-Logic  SFM-4  stopped-flow  instru¬ 
ment  (Molecular  Kinetics,  Pullman,  WA).  The  unfolding  reac¬ 
tion  for  the  WT  protein  can  be  modeled  as  a  two-state  process 
(15),  and  the  kinetic  and  equilibrium  data  for  the  mutants  were 


Abbreviations;  SH3,  src  homology  3;  WT.  wild  type;  Gnd,  guanidine. 
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Fig.  1.  (A)  Structure  of  the  srcSH3  domain  (Ifmk.pdb)  colored  by  previously  reported  <I)F-values  (13)  on  a  continuous  scale  from  red  =  1)  to  blue  (Of  -  0). 

Residues  at  which  mutations  increase  or  decrease  both  kf  and  ku  are  colored  in  yellow.  The  graphic  was  generated  with  molscript  (33)  and  raster3d  (34,  35).  (6) 
Structure  of  the  hydrogen  bond  network  between  the  /3-distal  hairpin  and  the  diverging  turn  (midas;  refs.  36  and  37).  Residues  included  in  the  double-mutant 
cycles  are  shown  in  red. 


fit  to  a  two-state  model.  For  each  mutant,  the  free  energy  of 
folding  is  calculated  as: 

AGu-f  -  RT\n{k,/k^l  [1] 

where  kf  and  are  the  rates  of  folding  and  unfolding,  respec¬ 
tively,  in  the  absence  of  denaturant.  The  differences  in  the  free 
energy  of  folding  (AACu-f)  and  in  the  folding  activation  energy 
(AAGu-t)  between  the  WT  protein  and  each  mutant  are  calcu¬ 
lated  as: 

AAGu-f  =  RTl\n{kr/kr')  +  and 

AAGu4  =  RT\n(ki'^ykr'').  [2] 

where  kf  and  k^  are  the  rates  of  folding  and  unfolding,  respec¬ 
tively,  at  denaturant  concentrations  experimentally  accessible 
for  that  mutant.  The  parameter  <E>f  is  defined  as 

=  AAGlj.i/AAGlj.f  [3] 

and  is  interpreted  as  the  fraction  of  the  mutated  residue’s 

interactions  that  are  formed  in  the  transition  state.  A  <I>F-value 
of  1  indicates  that  ail  of  a  residue’s  interactions  are  formed  in  the 
transition  state,  whereas  a  <I>f  of  0  means  that  the  residue  does 
not  make  stabilizing  interactions  in  the  transition  state  (5).  In  the 
case  of  the  double  mutants,  a  ^^F-value  for  the  pairwise  inter¬ 
action  can  be  determined  similarly: 

a>F‘"^=  AAGuV"7AAGuV"*-  [4] 

Loop  Entropy  Estimates.  The  change  in  the  free  energy  of  the 
unfolded  state  as  a  result  of  loop  insertion  or  disulfide  crosslink¬ 
ing  can  be  estimated  from  polymer  theory  (16): 

AG  =  -/?71n(L/L„),  [5] 

where  La  and  L  are  loop  lengths  before  and  after  the  modifi¬ 
cation,  respectively.  In  the  case  of  the  10-glycine  insertion  in  the 
n-src  loop,  the  original  loop  length  is  5  and  after  the  insertion  it 
is  15.  In  the  case  of  the  three  disulfide  crosslinks,  which  generate 


a  loop  in  the  protein  that  did  not  exist  previously,  L„  is  taken  to 
be  1,  and  L  is  the  length  of  the  loop  enclosed  by  the  crosslink. 

Results 

Hydrogen  Bond  Network  Between  the  Distal  ^-Hairpin  and  the  Di¬ 
verging  Turn.  Mutagenic  analysis  of  the  SH3  domain  folding 
transition  state  revealed  the  clustering  of  structured  residues  in 
the  distal  /3-hairpin  and  the  diverging  turn  (12).  Mutagenesis  also 
suggCvSted  that  these  elements  might  interact  with  each  other  at 
the  rate-limiting  step,  because  mutations  in  the  hydrogen  bond 
network  between  the  distal  /3-hairpin  and  the  diverging  turn  (Fig. 
]B)  had  a  dramatic  effect  on  the  rate  of  folding.  In  this  study,  we 
performed  double-mutant  cycles  on  these  hydrogen  bond  net¬ 
work  residues  (S47  and  T50  in  the  distal  ^-hairpin  and  E30  in  the 
diverging  turn)  to  quantify  interaction  energies  at  the  transition 
state  (4).  Both  E30A_S47A  (Fig.  2A)  and  E30A_T50A  (Fig.  2B) 
double  mutants  are  considerably  less  destabilized  than  expected 
from  the  sum  of  the  single-mutant  effects  (Table  1):  in  the  native 
state,  the  interaction  energy  between  the  two  mutated  residues 
is  1 .02  kcal/mol  for  E30A  and  S47A  (Fig.  2D)  and  1 .08  kcal/mol 
for  E30A  and  T50A.  A  large  fraction  of  this  interaction  energy 
is  present  at  the  folding  transition  state:  0.78  kcal/mol  for  E30 
and  S47  and  0.83  kcal/mol  for  E30  and  T50,  yielding  interaction 
^F-values  of  0,76  and  0.77,  respectively.  The  finding  that  these 
hydrogen  bonds  are  already  formed  at  the  transition  state 
confirms  directly  the  association  of  the  distal  /3-hairpin  and  the 
diverging  turn  in  the  transition  state  inferred  from  the  point 
mutagenesis  experiments. 

The  E30A  mutation  removes  a  large  portion  of  this  buried 
residue  and  is  therefore  expected  to  be  quite  disruptive.  As  a  less 
drastic  probe  of  the  interactions  in  the  hydrogen  bond  network, 
we  compare  the  kinetic  effect  of  the  S47A  mutation  at  pH  3  and 
pH  6  (Fig.  2C  and  Table  1).  At  pH  3,  the  carboxyl  group  of  E30 
probably  is  partially  protonated  (depending  on  the  local  pKa), 
thus  disrupting  some  of  its  interactions  with  the  distal  /3-hairpin 
(S47).  The  effect  of  S47A  on  stability  and  the  rate  of  folding  is 
smaller  at  pH  3  than  at  pH  6,  but  the  <I>F-value  is  still  1 .  This  effect 
is  consistent  with  the  idea  that  some  of  the  interactions  between 
S47  and  E30  present  at  the  transition  state  at  pH  6  can  be 
disrupted  by  low  pH. 
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Fig.  2.  Kinetic  analysis  of  E30A-S47A  (A)  and  E30A.T50A  (fi)  In  50  mM  NaPI,  pH  6.  at  295  K.  Gnd,  guanidine.  (O  Kinetic  analysis  of  WT  and  S47A  mutant  in  50 
mM  NaPi,  pH  3.  at  295  K.  Solid  lines  Indicate  the  best  fit  to  the  data  with  kaleidagraph.  (D)  Double-mutant  cycle  analysis  to  determine  the  interaction  energy 
between*E30A  and  S47A  In  the  native  (AAGu-f)  and  transition  states  (AAGu-t).  Data  for  the  calculations  are  presented  in  Table  1.  AAGu-t  is  calculated  as 
Energies  for  the  E30A^T50A  double  mutant  are  not  shown  explicitly  but  can  be  calculated  similarly  by  using  data  from  Table  1 . 


Backbone  Restriction  in  the  Folding  Transition  State.  In  addition  to 
manipulating  interaction  energies  of  side  chains  by  point  mu¬ 
tagenesis,  we  can  alter  the  entropy  of  the  protein  backbone  by 
engineering  loop  insertions  and  covalent  crosslinks  and  then  use 
the  effect  of  such  changes  on  the  folding  kinetics  as  a  reporter 
on  the  degree  of  association  of  the  structural  elements  in  the 
transition  state.  As  in  <I>-value  analysis,  the  premise  of  these 
experiments  is  that  changes  in  the  rates  of  folding  and  unfolding 
will  reveal  the  extent  to  which  two  elements  are  brought  together 
in  the  transition  state  compared  with  the  denatured  and  native 
states.  To  facilitate  interpretation  of  the  kinetic  results,  it  is 
reasonable  to  assume  that  the  primary  effect  of  these  modifi¬ 
cations  is  on  chain  conformational  entropy.  The  denatured  state, 
which  is  most  disordered,  is  expected  to  be  affected  significantly 

Table  1.  Kinetic  parameters  for  the  hydrogen  bond  network 
mutants 


Mutant 

ln(kf)0'^ 

ln(U^'^ 

mi 

mu 

AGu-f 

AAGu-f 

WT* 

4.10 

0.139 

1.02 

0.54 

3.95 

_ 

E30A* 

2.03 

1.48 

1.09 

0.65 

2.28 

-2.00 

S47A* 

2.01 

0.537 

1.50 

0.44 

2.18 

-1.46 

T50A* 

1.99 

0.750 

1.84 

0.47 

2.14 

-1.60 

E30A^S47A 

1.21 

1.39 

t 

0.46 

1.28 

-2.43 

E30A_T50A 

1.32 

1.66 

t 

0.39 

0.977 

-2.52 

WT_pH3 

2.08 

1.69 

1.08 

0.41 

1.46 

-2.09 

S47ApH3 

0.052 

1.59 

1.45 

0.38 

0.249 

-3.23 

Kinetics  of  folding  and  unfolding  were  followed  by  changes  In  tryptophan 
fluorescence  on  a  stopped  flow  instrument  at  295  K;  kf  Is  reported  In  the 
absence  of  denaturant,  and  ku  is  in  3  M  Gnd  to  avoid  extrapolation;  mf  and  mu 
are  the  dependences  of  the  folding  and  the  unfolding  rates,  respectively,  on 
Gnd.  AGu-f  (free  energy  of  unfolding)  and  AAGu-f  (the  difference  In  AGu-f 
between  WT  and  the  mutant  proteins)  were  calculated  from  the  kinetic 
parameters  as  described  in  the  Methods  section.  Typical  errors  for  the  kinetic 
measurements  are  1-10%  as  reported  In  ref.  13. 

♦Kinetic  data  for  these  mutants  were  published  previously  in  ref.  12. 

^hese  values  could  not  be  estimated  reliably  because  of  the  small  region  over 
which  ki  can  be  measured. 


by  the  modifications:  glycine  insertion  in  loops  increases  the 
entropy  of  the  denatured  state  and  thus  lowers  its  free  energy, 
whereas  disulfide  crosslinks  decrease  the  entropy  of  the  dena¬ 
tured  state  and  destabilize  it  in  proportion  to  the  length  of  the 
crosslinked  fragment  (Fig.  3).  The  entropy  of  the  native  state,  on 
the  other  hand,  should  not  change  greatly  in  the  case  of  the 
disulfide  crosslinks,  because  the  elements  we  are  probing  already 
interact  fully  in  the  native  state.  In  the  case  of  the  10-glycine 
mutant,  the  entropy  of  the  native  state  will  increase  but  to  a 
lesser  extent  than  the  entropy  of  the  denatured  state.  There 
might  be  some  destabilization  of  the  native  state  resulting  from 
disruption  of  local  interactions  at  the  site  of  glycine  insertion  or 
strain  from  suboptimal  disulfide  geometry;  however,  we  have 
tried  to  minimize  these  effects  by  choosing  the  modification  sites 
carefully.  The  entropic  effect  on  the  transition  state  then  de¬ 
pends  on  the  proximity  of  the  structural  elements  being  probed 
and  can  be  deduced  from  the  changes  in  kf  and  If  only  k(  is 
affected,  it  can  be  concluded  that  the  crosslinked  regions  are  as 
ordered  in  the  transition  state  as  in  the  native  state  (Fig.  3  A  and 
B),  whereas,  if  only  k^  changes,  the  region  is  likely  to  be  as 
disordered  in  the  transition  state  as  in  the  denatured  state. 

Glycine  Insertion  in  the  n-src  Loop.  The  involvement  of  the  n-src 
loop  in  the  transition  state  was  difficult  to  evaluate  from  the 
point  mutagenesis  analysis  because  of  the  very  small  effect  on 
stability  of  all  of  the  mutations  in  this  region  (13).  An  insertion 
of  10  glycine  residues  in  the  n-src  loop  between  residues  40  and 
41  (Fig.  3A)  was  designed  to  test  the  importance  of  this  loop  in 
bringing  the  distal  /3-hairpin  and  the  diverging  turn  together  at 
the  transition  state.  The  addition  of  10  glycine  residues  consid¬ 
erably  increases  the  entropic  cost  of  bringing  these  two  elements 
together  and  is  expected  to  decrease  the  rate  of  folding  if 
association  of  the  two  elements  is  required  in  the  transition  state. 
A  correlation  between  the  rate  of  folding  and  the  length  of  this 
loop  was  observed  in  a  comparison  of  homologous  SH3  domains 
(17):  the  phosphatidylinositol  3-kinase  SH3  domain  has  the 
longest  n-src  loop,  and  its  folding  rate  is  the  slowest  of  the  SH3 
domains  that  have  been  characterized  (18).  Furthermore,  in  a 
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Fig.  3.  Characterization  of  the  10-glycine  insertion  (A),  the  distal  hairpin  crosslink  (B),  the  RT  loop  crosslink  (O,  and  the  N-  and  C-termina!  crosslink  mutants 
(D).  For  each  modification,  panels  from  Top  to  Bottom  show  site  of  the  modification,  kinetic  analysts,  and  an  energy  diagram  explaining  the  effects  on  kinetics. 
The  structure  diagrams  of  the  SH3  domain  were  generated  with  molscript  (33)  and  raster3d  (34,  35).  Energy  diagrams  show  the  relative  free  energies  of  the 
unfolded  (U),  the  transition  (+),  and  the  native  state  (N)  before  (thick  lines)  and  after  (thin  lines)  each  modification.  In  the  case  of  the  10-glycine  Insertion,  the 
reference  protein  is  the  WT,  and,  for  the  three  disulfide  crosslinks,  the  reference  is  the  reduced  version  of  the  particular  double  cysteine  mutant.  Solid  arrows 
indicate  the  experimentally  determined  changes  in  the  free  energy  of  denatured  state,  and  dashed  arrows  indicate  the  changes  predicted  from  loop  entropy 
estimates  (16).  lOgly,  10-glycine;  NC,  N-  and  C-terminal  crosslink;  SS,  distal  hairpin  crosslink;  red,  reduced;  ox,  oxidized. 


combinatorial  mutagenesis  experiment  on  the  SH3  domain, 
phage-display  selection  of  correctly  folded  proteins  yielded  an 
enrichment  for  shortened  n-src  loops  (19).  The  10-glycine  in¬ 
sertion  leads  to  an  overall  decrease  in  stability  of  0.72  kcal/mol 
(Table  2),  consistent  with  loop  entropy  estimates  (-1.5/?71n(L/ 
Lo)  =  0.78  kcal /mol,  where  Lq  and  L  are  the  original  and  the  new 
loop  lengths,  respectively;  ref.  16).  Remarkably,  the  kinetic 
effect  of  the  insertion  (Fig.  3A  and  Table  2)  is  exclusively  on  the 
rate  of  folding,  kf.  The  structural  elements  on  either  end  of  the 
n-src  loop  (the  distal  j3-hairpin  and  the  diverging  turn)  seem  to 
be  fully  associated  at  the  folding  transition  state,  consistent  with 
the  double-mutant  results  (Fig.  2).  The  lack  of  effect  of  such  a 
large  insertion  on  the  unfolding  rate,  ku,  is  striking  and  suggests 
that  unfolding  is  initiated  by  the  dissociation  of  the  two  terminal 
strands  and  the  RT  loop  but  does  not  involve  disruption  of  the 
n-src  loop. 

Crosslinking  the  Distal  /3-Hairpin.  <I>-Value  analysis  of  both  the  src 
and  the  a-spectrin  SH3  domains  highlighted  the  importance  of 
the  distal  j8-hairpin  in  the  folding  transition  state  (12,  20).  For 
both  proteins,  the  tip  of  the  distal  /3-hairpin  is  the  only  region 
that  contains  residues  with  <I>F-values  equal  to  1  (in  the  SH3 
domain,  S47,  T50,  and  G51  have  all  of  their  local  interactions 
formed  at  the  transition  state).  The  middle  of  the  hairpin, 
however,  appeared  somewhat  flexible  (L44A  and  Y55A  on  the 


solvent-exposed  side  of  the  hairpin  have  intermediate  ^-values), 
and  there  were  no  suitable  mutations  to  probe  strand  association 
at  the  base  of  the  hairpin.  To  examine  specifically  the  backbone 

Table  2.  Kinetic  parameters  for  lO-glycine  insertion  and  disulfide 
crosslink  mutants 


Mutant 

ln(i^f)iM 

\n{ku)^-^M 

mf 

mu 

AGu-f 

AAGu-f 

WT 

2.36 

2.44 

1.02 

0.54 

3.95 

— 

lOgly 

0.997 

2.30 

1.20 

0.44 

2.86 

-0.72 

SS_red 

2.75 

3.38 

1.05 

0.33 

2.50 

-0.32 

SS.ox 

6.50 

3.37 

0.750 

0.59 

5.83 

1.88 

RT_red 

2.65 

2.37 

1.18 

0.39 

3.47 

0.21 

RT.ox 

3.81 

0.62 

0.78 

0.42 

4.98 

1.92 

NCred 

2.58 

1.59 

1.03 

0.50 

4.36 

0.63 

NC_ox 

4.73 

-1.30 

0.766 

0.80 

8.70 

3.58 

Kinetics  of  folding  and  unfolding  were  followed  by  changes  in  tryptophan 
fluorescence  on  a  stopped  flow  Instrument  at  295  K;  kf  Is  reported  in  1  M  Gnd, 
and  ku  is  in  5.5  M  Gnd  to  avoid  extrapolation;  mf  and  mu  are  the  dependences 
of  the  folding  and  the  unfolding  rates,  respectively,  on  Gnd.  AGu-f  (free 
energy  of  unfolding)  and  AAGu-f  (the  difference  in  AGu-f  between  the  par¬ 
ticular  mutant  protein  and  WT)  were  calculated  from  the  kinetic  parameters 
as  described  In  the  Methods  section.  Typical  errors  for  the  kinetic  measure¬ 
ments  are  1-10%  as  reported  in  ref.  13.  Abbreviations  are  defined  in  the 
legend  to  Fig.  3. 
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conformation  of  the  jS-hairpin  in  the  transition  state,  we  tested 
the  kinetic  consequences  of  covalently  crosslinking  the  hairpin. 
For  this  purpose,  W43  and  S58  at  the  base  of  the  distal  )3-hairpin 
(Fig.  3S)  were  mutated  to  cysteines  and  then  crosslinked  by  using 
an  oxidizing  agent  (see  Methods  section).  Under  reducing  con¬ 
ditions,  the  double-cysteine  mutant  (SS  mutant)  is  destabilized 
compared  with  the  WT  SH3  domain;  however,  oxidation  signif¬ 
icantly  stabilizes  the  mutant  protein:  AAGu-f  between  the 
reduced  and  oxidized  forms  of  the  mutant  is  2.55  kcal/mol.  This 
value  matches  closely  the  theoretical  loop  entropy  estimate  from 
polymer  theory  (2.38  kcal/mol;  see  Methods  section;  ref.  16), 
suggesting  that  stabilization  results  largely  from  the  decrease  in 
entropy  of  the  denatured  state.  Kinetic  analysis  reveals  that  the 
oxidized  protein  folds  30  times  faster  than  the  reduced  form, 
whereas  the  unfolding  rate  is  virtually  unchanged  (Fig.  3B  and 
Table  2).  The  resulting  Op-value  of  1  for  the  disulfide  crosslink 
unambiguously  confirms  that  the  distal  j3-hairpin  is  conforma- 
tionally  restricted  at  the  transition  state  (Fig.  3B).  As  in  the  case 
of  the  glycine  insertion,  the  lack  of  effect  of  the  crosslink  on 
is  remarkable:  the  unfolding  event  must  not  involve  even  partial 
unraveling  of  the  distal  /3-hairpin.  Flexibility  in  the  middle  of  the 
hairpin  therefore  does  not  prevent  the  conformational  locking  of 
the  hairpin’s  base. 

CrossMnking  the  RT  Loop.  To  probe  the  extent  of  formation  of  the 
RT  loop  in  the  folding  transition  state,  we  have  introduced  a 
disulfide  crosslink  at  its  base.  T9  at  the  N  terminus  of  the  protein 
and  Q33  after  the  diverging  turn  were  mutated  to  cysteine  and 
oxidized  to  close  off  a  loop  of  23  residues  covalently  (Fig.  3C). 
Oxidation  stabilizes  the  protein  by  1.71  kcal/mol,  suggesting  that 
the  disulfide  bond  geometry  is  favorable  and  does  not  introduce 
strain  in  the  native  state  of  the  protein.  This  stabilization, 
however,  is  less  than  the  loop  entropy  reduction  estimate  (2.75 
kcal/mol;  see  Methods  section;  ref.  16),  suggesting  that  the  RT 
loop  might  be  partially  structured  in  the  denatured  state.  Kinetic 
analysis  (Fig.  3C  and  Table  2)  shows  that,  in  contrast  to  the  distal 
hairpin  crosslink,  formation  of  the  RT  loop  crosslink  dramati¬ 
cally  decreases  the  unfolding  rate,  suggesting  that  the  rate- 
limiting  step  in  unfolding  involves  the  opening  of  the  RT  loop. 
Crosslinking  also  increases  the  folding  rate,  indicating  that  parts 
of  the  RT  loop  might  be  structured  in  the  transition  state. 
Considering  the  point  mutagenesis  results  that  revealed  ^>-values 
uniformly  close  to  0  throughout  the  N-terminal  strand  and  the 
tip  of  the  RT  loop,  it  is  most  likely  that  the  crosslink  itself  has 
caused  an  expansion  of  the  structured  region  of  the  transition 
state  to  include  the  RT  loop.  Another  possibility  is  that  the  RT 
loop  is  stabilized  primarily  by  backbone  hydrogen  bonds  and  not 
by  side  chain-side  chain  interactions  in  the  transition  state. 

Crosslinking  the  N  and  C  Termini.  The  point  mutagenesis  analysis 
suggested  that  the  N  and  C  termini  do  not  associate  at  the  folding 
transition  state,  because  none  of  the  mutations  in  this  region 
affect  the  rate  of  folding.  We  probed  the  association  between  the 
N-  and  C-terminal  strands  in  the  transition  state  by  engineering 
a  disulfide  crosslink  between  them.  Two  cysteine  mutations  were 
introduced  in  the  SH3  domain  (T9C  at  the  N  terminus  and  S64C 
at  the  C  terminus)  and  then  crosslinked  as  described  for  the 
distal  hairpin.  Comparison  of  the  T9C_S64C  mutant  (NC  mu¬ 
tant)  under  reducing  and  oxidizing  conditions  (Fig.  3/)  and  Table 
2)  reveals  that  the  oxidized  protein  is  stabilized  significantly 
(AAGu-f  =  2.8  kcal/mol)  but  less  than  expected  from  the  effect 
of  crosslinking  on  the  entropy  of  the  denatured  state  (AAG  — 
3.52  kcal/mol;  see  Methods  section;  ref.  16;  the  denatured  state 
may  be  more  ordered  than  the  random  coil  model  assumed  in  the 
loop  entropy  estimate).  Kinetic  measurements  show  that  both 
the  folding  and  the  unfolding  rates  of  the  SH3  domain  are 
affected  roughly  equally  by  the  disulfide  crosslink.  In  general, 
circularization  is  always  expected  to  increase  kf  because  of  the 


greater  decrease  in  the  entropy  of  the  denatured  state  compared 
with  that  of  the  transition  state,  but  will  decrease  only  if  the 
termini  are  apart  at  the  transition  state.  (In  proteins,  like 
acyl-CoA-binding  protein,  whose  termini  interact  at  the  transi¬ 
tion  state,  crosslinking  would  be  expected  to  affect  primarily  the 
rate  of  folding;  ref.  7.)  The  decrease  in  the  unfolding  rate  brought 
about  by  the  NC  crosslink  suggests  that  the  termini  are  not  as 
ordered  in  the  SH3  transition  state  as  they  are  in  the  native  state. 

Discussion 

SH3  Folding  Transition  State.  ^-Value  analysis  has  become  the 
method  of  choice  for  studying  folding  transition  states  (4,  5).  In 
this  study,  we  have  extended  the  repertoire  of  probes  of 
the  transition  state  ensemble  to  include  glycine  insertion  and 
disulfide  crosslinking  as  direct  experimental  measures  of  the 
conformational  constraints  on  the  peptide  backbone  and 
the  association  of  structural  elements.  Engineering  of  disulfide 
crosslinks  has  been  used  to  assess  the  effect  of  reducing  con¬ 
formational  entropy  in  different  parts  of  the  molecule  (21-25)  or 
to  explore  transition-state  heterogeneity  (26,  27).  Loop  inser¬ 
tions  have  been  used  to  explore  the  role  of  loops  in  determining 
protein  stability  and  the  folding  mechanism  (28-30). 

Our  current  findings  from  the  double-mutant  analysis,  10- 
glycine  insertion,  and  covalent  crosslinking  combined  with  the 
point  mutagenesis  results  provide  a  comprehensive  picture  of  the 
folding  transition  state  for  the  SH3  domain.  The  distal  j3-hairpin 
still  stands  out  as  the  structural  element  best  formed  in  the 
transition  state;  however,  now  we  have  information  about  strand 
pairing  along  the  entire  length  of  the  hairpin.  Previous  experi¬ 
ments  had  indicated  that  the  tip  of  the  hairpin  is  well  ordered  as 
judged  by  the  clustering  of  high  <I> -value  residues;  however,  the 
middle  of  the  hairpin  is  likely  not  as  rigid  in  the  transition  state 
as  in  the  native  state,  because  solvent-exposed  residues  are 
paired  only  partially  with  each  other.  The  finding  that  the  distal 
/3-hairpin  crosslink  increases  the  rate  of  folding  and  has  no  effect 
on  the  unfolding  rate  suggests  that  the  two  strands  come  in  close 
proximity  at  the  base  of  the  hairpin  before  the  folding  transition 
state.  Interestingly,  similar  results  were  found  for  one  of  the 
hairpins  in  protein  L  (38),  suggesting  that  “looping”  in  the 
middle  of  the  hairpin  might  be  a  common  theme  during  protein 
folding.  Constraining  the  tip  and  the  base  of  the  hairpin  may  be 
important  for  specifying  the  topology  of  the  folding  protein, 
whereas  keeping  the  middle  flexible  would  allow  hydrophobic 
core  rearrangements  after  the  transition  state. 

The  current  experiments  establish  firmly  the  interaction  of  the 
distal  /3-hairpin  and  the  diverging  turn  in  the  transition  state.  The 
double-mutant  results  provide  concrete  evidence  that  the  hy¬ 
drogen  bond  network  between  these  two  elements  is  mostly 
formed  at  the  transition  state.  The  high  ^hp-values  for  these 
nonlocal  hydrogen  bond  interactions  (0.78  and  0.83)  indicate 
that  most  of  the  interaction  energy  is  present  at  the  rate-limiting 
step;  further  alignment  of  the  hydrogen  bond  geometries  and 
immobilization  of  the  participating  residues  after  the  transition 
state  must  contribute  the  remaining  ^20%.  It  is  generally 
assumed  that  transition  states  for  folding  are  stabilized  primarily 
by  hydrophobic  interactions,  with  hydrogen  bonds  contributing 
only  later  to  the  stability  of  the  native  state  because  of  their 
stricter  geometric  requirements.  The  SH3  domain  is  the  first  case 
in  which  nonlocal  side  chain  hydrogen  bonds  have  been  found  to 
stabilize  the  transition  state  (31).  The  hydrogen  bond  network 
does  not,  however,  seem  to  be  required  for  folding  (e.g.,  to  confer 
specificity  or  determine  the  alignment  of  structural  elements).  In 
the  a-spectrin  SH3  domain,  for  example,  this  interaction  is 
replaced  by  a  hydrophobic  cluster,  and  in  the  fyn  SH3  domain 
(78%  homologous  to  the  SH3  domain),  the  same  hydrogen  bond 
network  is  in  place  in  the  native  state  but  does  not  contribute  to 
stabilization  of  the  transition  state  (A.  Davidson,  personal 
communication).  The  difference  between  the  src  and  fyn  SH3 
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domains  may  be  a  result  of  the  slightly  different  structures  of 
their  diverging  turns:  perhaps  the  two  large  phenylalanines  in  fyn 
(instead  of  Phe  and  Leu  in  src)  cannot  pack  closely  next  to  each 
other  and  interact  with  the  distal  )3-hairpin  until  after  the 
transition  state.  The  local  variation  in  the  types  of  interactions 
stabilizing  the  transition  state  is  an  illustration  of  the  lack 
of  selective  pressure  on  the  details  of  the  protein-folding 
mechanism  (32). 

The  association  of  the  distal  )3-hairpin  and  the  diverging  turn 
at  the  transition  state  is  confirmed  further  by  the  10-glycine 
insertion  into  the  n-src  loop  connecting  the  two  elements.  The 
exclusive  effect  of  the  insertion  on  the  folding  rate  strongly 
suggests  that  the  peptide  backbone  of  the  regions  flanking  the 
loop  is  constricted  at  the  rate-limiting  step.  In  marked  contrast, 
all  previous  loop-insertion  experiments  revealed  that  both  fold¬ 
ing  rates  decrease  and  unfolding  rates  increase  as  loops  lengthen. 
The  context  dependence  of  loop  lengthening  has  been  noted 
before;  however,  it  has  been  explained  mostly  in  terms  of  the 
flexibility  of  the  region  in  the  native  state  (28-30).  Our  findings 
indicate  that  the  kinetic  effect  of  loop  elongation  is  related 
directly  to  the  extent  to  which  the  elements  connected  by  the 
loop  are  topologically  constrained  at  the  transition  state. 

The  effects  of  the  disulfide  crosslinks  and  the  loop  insertion  on 
the  unfolding  rate  highlight  the  pronounced  hierarchy  of  events. 
The  distal  jS-hairpin  crosslink  and  the  glycine  loop  insertion  have 
no  effect  on  /cu,  clearly  indicating  that  the  three-stranded  sheet 
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formed  by  the  interaction  of  the  distal  j3-hairpin  and  the 
diverging  turn  remains  intact  at  the  unfolding  transition  state.  In 
contrast,  the  RT  loop  crosslink  and  the  NC  crosslink  dramati¬ 
cally  slow  down  the  unfolding  rate,  suggesting  that  the  rate- 
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an  even  greater  structural  polarization  of  the  folding  transition 
state  than  suggested  by  our  previous  studies. 

The  conformational  restriction  of  structural  elements  in  the 
src  SH3  domain  transition  state  has  implications  for  the  mech¬ 
anism  of  folding.  The  transition  state  ensemble  consists  of  a 
relatively  small  number  of  conformers,  all  of  which  have  the 
distal  j3-hairpin  and  the  diverging  turn  ordered  and  interacting 
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and  the  N  and  C  termini.  The  energy  landscape  of  this  protein, 
therefore,  significantly  deviates  from  a  symmetrical  funnel  in 
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particular  degree  of  freedom.  The  folding  of  the  src  SH3  domain 
is  surprisingly  consistent  with  the  more  traditional  single  path¬ 
way-based  picture  of  protein  folding. 
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Triple-resonance  NMR  experiments  were  used  to  assign  the 

and  NH  resonances  for  all  the  residues  in  the  denatured  state  of  a 
destabilized  protein  L  variant  in  2  M  guanidine.  The  chemical  shifts  of 
most  resonances  were  very  close  to  their  random  coil  values.  Significant 
deviations  were  observed  for  G22,  L38  and  K39;  increasing  the  denatur- 
ant  concentration  shifted  the  chemical  shifts  of  these  residues  towards 
theory  random  coil  values.  Medium-range  nuclear  Overhauser  enhance¬ 
ments  were  detected  in  segments  corresponding  to  the  turn  between  the 
first  two  strands,  the  end  of  the  second  strand  through  the  turn  between 
the  second  strand  and  the  helix,  and  the  turn  between  the  helix  and  the 
third  strand  in  3D  N^^-HSQC-NOESY-HSQC  experiments  on  perdeut- 
erated  samples.  Longer-range  interactions  were  probed  by  measuring  the 
paramagnetic  relaxation  enhancement  produced  by  nitroxide  spin  labels 
introduced  via  cysteine  residues  at  five  sites  around  the  molecule. 
Damped  oscillations  in  the  magnitude  of  the  paramagnetic  relaxation 
enhancement  as  a  function  of  distance  along  the  sequence  suggested 
native-like  chain  reversals  in  the  same  three  turn  regions.  The  more 
extensive  i$iteractions  within  the  region  corresponding  to  the  first  P-tum 
than  in  the  region  corresponding  to  the  second  P-tum  suggests  that  the 
asymmetry  in  the  folding  reaction  evident  in  previous  studies  of  the  pro¬ 
tein  L  folding  transition  state  is  already  established  in  the  denatured 
state. 
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Introduction 

How  a  protein  folds  into  a  unique  three-dimen¬ 
sional  stmcture  is  one  of  the  greatest  questions  of 
modem  stmctural  biology.  An  exciting  recent 
development  is  the  direct  stmctural  characteriz¬ 
ation  of  the  starting  point  of  the  folding  reaction, 
the  denatured  state,  using  multi-dimensional  NMR 
m.ethods.  It  has  been  foimd  (Neri  et  ai,  1992; 
Logan  et  ai,  1994;  Shortle  1996a,b;  Eliezer  et  al,, 
1998;  Fong  et  al,  1998;  Mok  et  al,  1998)  that  the 
unfolded  states  of  several  proteins  under  both 
denatiuring  and  native  solution  conditions  can  con¬ 
tain  a  significant  amoimt  of  residual  structure. 
Recent  studies  under  non-denaturing  conditions 


Abbreviations  used:  PRE,  paramagnetic  relaxation 
enhancement;  HSQC,  heteronuclear  single  quantum 
coherence;  NOESY,  NOE  spectroscopy;  CD,  circular 
dichroism. 
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have  demonstrated  that  the  denatured  state  ensem¬ 
ble  of  staphylococcal  nuclease  has  a  native-like 
overall  topology  (Gillespie  &  Shortle,  1997),  while 
the  overall  topology  of  the  drk  SH3  unfolded  states 
is  not  native-like,  although  it  has  some  native-like 
features  (Mok  et  al,  1998).  The  detailed  stmctural 
characterization  of  the  denatured  states  of  proteins 
whose  folding  transition  has  been  extensively  stu¬ 
died  should  increase  understanding  of  the  early 
stages  of  the  folding  process. 

We  have  chosen  the  IgG  binding  domain  of  pro¬ 
tein  L  as  a  model  system  for  understanding  folding 
in  detail.  The  folding  of  protein  L  has  been  charac¬ 
terized  using  a  wide  range  of  methods  (Yi  & 
Baker,  1996;  Scalley  et  al,  1997;  Yi  et  al,  1997;  Gu 
et  al,  1997;  Plaxco  et  al,  1999;  Kim  et  al,  1999)  and 
the  folding  transition  state  has  been  extensively 
mapped  through  the  analysis  of  the  effect  of  70 
point  mutants  distributed  around  the  protein. 
A  destabilized  mutant  (F20W/Y32A)  of  protein  L 
has  recently  been  characterized  by  circular  dichro¬ 
ism  and  stopped-flow  kinetics  (Scalley  et  al,  1999). 
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These  data  suggested  the  presence  of  residual 
structure  in  2  M  to  3  M  guanidinium  chloride.  In 
this  study  we  use  NMR  methods  to  gain  more 
specific  structural  information  on  this  denatured 
state  of  protein  L. 

Results 

Assignment  of  the  backbone 
and  NH  resonances  of  F20W/Y32A 

Characterization  of  denatured  states  of  proteins 
using  NMR  techniques  is  often  challenging,  since 
the  chemical  shift  dispersion  of  most  resonances  is 
poor  because  of  conformational  averaging.  How¬ 
ever,  the  backbone  ^^N  and  chemical  shifts, 
which  are  mainly  influenced  by  residue  type  and 
the  local  amino  acid  sequence  (Braun  et  ai,  1994; 
Yao  et  ai,  1997),  remain  well  dispersed  in  the 
denatured  state.  Using  ^^N-double  labeled  pro¬ 
tein  sample  and  triple  resonance  NMR  experiments 
(see  Materials  and  Methods),  we  have  assigned  the 
i3c^^  i3c3,  i5jsj  jsjpj  resonances  for  all  the  resi¬ 
dues  of  the  F20W/Y32A  mutant  in  2  M  guanidine 
at  pH  5.0. 

and  chemical  shifts  are  predominantly 
determined  by  backbone  conformation,  and  the 
perturbations  of  the  and  chemical  shifts 
from  their  random  coil  values  (Spera  &  Bax,  1991; 
Wishart  &  Skyes,  1994)  are  reliable  indicators  of 
secondary  structure  in  folded  proteins.  In  general, 
resonances  are  shifted  downfield  by  an  aver¬ 
age  of  2.6  ppm  for  a-helices,  and  shifted  upheld  by 
1.7  ppm  for  P-sheets.  Figure  1  shows  that  the  devi¬ 
ations  of  the  cheinical  shifts  of  F20W/Y32A  in 
2  M  guanidine  at  pH  5.0  from  the  random  coil 
values  are  very  small,  indicating  the  population  of 
regular  secondary  structure  is  very  low  imder 
these  conditions.  However,  small  but  consistent 
upheld  chemical  shift  perturbations  were  observed 
for  every  residue  of  the  segment  from  A31  to  D41 
(helical  in  nahve  protein  L  structure),  suggesting 
that  there  is  some  residual  helical  content  in 
F20W/Y32A  in  2  M  guanidine.  Also,  small,  but 
consistent  downheld  diemical  shift  perturbations 
were  observed  for  every  residue  from  Y45  to  A50 
(the  third  strand  in  native  protein  L  structure), 
suggesting  that  this  segment  has  some  preference 
for  extended  conformations  under  these  con¬ 
ditions. 

Previous  chemical  denaturahon  studies  using 
huorescence  techniques  suggested  that  there  may 
be  some  residual  structure  around  W20  in  2-3  M 
guanidine  that  is  lost  at  higher  (>3  M)  guanidine 
concentrations  (Scalley  et  ai,  1999).  A  guanidine 
htration  ranging  from  1.5  M  to  5  M  guanidine  was 
carried  out  to  investigate  possible  conformational 
changes  in  the  denatured  state  ensemble.  There  is 
no  dramatic  change  in  the  ^H-^^N  HSQC  spectra 
from  1.5  M  to  5  M  guanidine  except  that  small,  but 
significant  chemical  shift  perturbations  (^0.10  ppm 
shifting  toward  the  random  coil  values)  were 
observed  for  the  amide  group  protons  of  G22,  L38 


sequence 


Figure  1.  Chemical  shift  perturbations  of  the  res¬ 
onances  of  F20W/Y32A  in  2  M  guanidine  from  the  ran¬ 
dom  coil  values.  The  two  straight  horizontal  lines  at 
2.6  ppm  and  -1.7  ppm  represent  the  perturbations 
expected  for  regular  a-helix  and  p-sheet  conformations 
respectively.  The  random  coil  values  were  taken  from 
Wishart  &  Sykes,  1994. 


and  K39  (Figure  2(a)).  These  shifts  are  roughly  line¬ 
ar  functions  of  the  denaturant  concentration;  there 
is  little  indication  of  a  cooperative  transition 
between  different  populations  (Figure  2(b)).  These 
results  suggest  that  there  is  some  residual  structure 
aroimd  G22,  L38  and  K39  in  denatured  F20W/ 
Y32A  in  2-3  M  guanidine. 

Nuclear  Overhauser  enhancements  (NOEs) 
between  amide  protons 

NOEs  between  amide  group  protons  were 
obtained  using  3D  H\  N^^-HSQC-NOESY-HSQC 
experiments  on  perdeuterated  F20W/Y32A  in 
2-3  M  guanidine.  Recent  work  on  the  drk  SH3 
domain  demonstrated  that  deuteration  greatly 
enhances  NOESY-based  studies  of  denatured  pro¬ 
teins  because  of  longer  relaxation  time  due  to  the 
reduced  spin  diffusion  (Sattler  &  Fesik  1996). 
Sequential  (i,  f  4*  1)  and  (f,  i  -H  2)  HN-HN  NOEs 
were  observed  for  most  residues  of  F20W/Y32A  in 

2  M  guanidine.  Only  a  few  medium-range  (/,  f  -h  3) 
and  (/,  i  -h  4)  HN-HN  NOEs  were  detected 
(Figure  3).  All  observed  medium-range  NOEs  are 
located  in  segments  close  to  turn  regions  in  native 
protein  L.  These  segments  correspond  to  the  first 
hairpin  turn,  the  end  of  the  second  p-strand  before 
the  helix  and  the  turn  following  the  helix.  Thus, 
there  are  some  native-like  turn  structures  popu¬ 
lated  in  the  denatured  states  of  F20W/Y32A  in  2- 

3  M  guanidine.  This  is  consistent  with  the  result 
from  guanidine  titration  experiments  described 
above.  NOEs  in  the  region  corresponding  to  the 
second  P-tum  in  native  protein  L  were  conspicu¬ 
ously  absent. 
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Figure  2.  Guanidine-dependence  of  amide  proton 
chemical  shifts,  (a)  Deviation  of  chemical  shifts  in  1.5  M 
from  values  in  5.0  M  guanidine  for  all  the  amide  group 
protons  of  F20W/Y32A/C63.  (b)  Guanidine  dependence 
of  the  amide  group  proton  chemical  shifts  of  G22,  L38 
and  K39. 


Measurement  of  paramagnetic  relaxation 
enhancement  (PRE)  by  nitroxide  spin  label 

To  probe  longer-range  interactions,  we  examined 
the  paramagnetic  relaxation  enhancement  of  the 
amide  group  protons  due  to  introduced  nitroxide 
spin  labels.  This  technique  has  been  used  success¬ 
fully  to  characterize  the  denatured  state  of  staphy¬ 
lococcal  nuclease  under  native  conditions 
(Gillespie  &  Shortle,  1997).  The  advantage  of  PRE 
is  that  the  free  electron  from  the  nitroxide  label 
increases  the  relaxation  rate  of  protons  over  a  dis¬ 
tance  of  up  to  20  to  25  A.  In  contrast,  the  NOE 
between  two  protons  only  extends  up  to  10  A  even 
at  very  long  mixing  times  (Mok  et  ai,  1998).  Thus, 
PRE  can  be  useful  for  studying  weak  long-range 
molecular  interactions  in  relatively  disordered 
denatured  states. 


Paramagnetic  relaxation  enhancement  increases 
relaxation  rates  in  a  distance-dependent  manner. 
The  enhancement  effect  is  described  by  the 
Solomon-Bloembergen  equations  (Solomon  & 
Bloembergen,  1956;  Kosen  1989): 

A(l/Ti)  =  ARi  =  2K(3tc/(1  +  (1) 

A(1/T2)  =  AKz  =  X(4tc  +  3tc/(1  +  coh^c))//-^  (2) 

where  K  is  1.23  x  10"^^  cm^  s"*^  for  a  nitroxide  rad¬ 
ical,  r  is  the  distance  between  the  electron  and  the 
proton,  tj.  is  the  correlation  time  for  the  electron- 
proton  vector,  and  0)^  is  the  Larmour  frequency  of 
the  proton.  Equations  (1)  and  (2)  are  based  on  the 
assumptions  that  the  vector  between  the  electron 
and  the  proton  is  free  to  undergo  isotropic 
rotational  diffusion,  and  that  its  length  is  fixed. 
Both  equations  are  valid  only  for  relaxation  due  to 
the  magnetic  interaction  between  a  single  unpaired 
electron  and  a  proton  of  a  macromolecule  when 
is  greater  than  10^^  second  and  cOh  is  between  400 
and  600  MHz. 

To  provide  attachment  sites  for  the  spin  labels, 
cysteine  residues  were  introduced  one  at  a  time  at 
strategic  locations  on  the  surface  of  protein  L.  To 
minimize  perturbations  to  the  structure  accompa¬ 
nying  the  cysteine  substitutions,  the  residues  at  the 
chosen  positions  were  highly  solvent-accessible 
and  make  few  interactions  with  other  residues.  To 
obtain  information  on  all  parts  of  the  protein,  one 
probe  was  introduced  into  each  of  the  five  second¬ 
ary  structural  elements  in  the  protein.  The  sites  at 
which  cysteine  residues  were  introduced  are 
shown  in  Figure  4;  EIC  is  at  the  N  terminus,  T17C 
in  the  middle  of  the  second  strand,  S29C  in  the 
middle  of  the  helix,  T46C  in  the  middle  of  the  third 
strand  and  C63  was  added  at  the  C  terminus. 
Nitroxide  spin  labels  were  attached  to  the  intro¬ 
duced  cysteine  residues  as  described  in  Materials 
and  Methods. 

Direct  measurement  of  the  effect  of  the  paramag¬ 
netic  relaxation  enhancement  on  and  T2  can  be 
carried  out  using  standard  NMR  techniques,  such 
as  the  inversion-recovery  sequence  and  CPMG 
spin  echo  sequence,  but  these  techniques  can  be 
quite  time-consuming.  Instead,  it  is  more  efficient 
to  measure  the  decrease  of  peak  intensity  in 

HSQC  spectra.  The  peak  linewidth  in  the  HSQC 
spectrum  is  increased  due  to  faster  transverse 
relaxation  (R2)  of  ^H,  and  ultimately  the  peak 
intensity  is  decreased.  Figure  5  shows  the  PRE 
effect  on  peak  intensity  by  comparing 
HSQC  spectra  of  T17C*  with  and  without  the 
unpaired  free  electron  present  (oxidized  and 
reduced  forms  respectively).  The  ma^itude  of  the 
PRE  for  each  residue  in  each  protein  was  deter¬ 
mined  using  the  simulation  method  described  by 
Gillespie  and  co-workers  (Gillespie  &  Shortle, 
1997)  (see  Materials  and  Methods)  and  is  shown  in 
Figure  6.  The  bars  indicate  the  magnitude  of  the 
PRE,  the  arrows  indicate  the  position  of  the  label. 
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Figure  3.  Schematic  of  the  medium-range  HN-HN  NOEs  observed  in  HSQC-NOESY-HSQC  experiments  with  a 
600  ms  mixing  time  along  with  the  protein  sequence  and  the  secondary  structural  elements  of  the  native  protein  L. 
The  filled  bars  represent  P-strands  and  the  zigzag  line  represents  a-helix.  The  NOEs  refer  to  the  NOEs  observed 
between  the  side-chain  NeH  of  W20  and  the  backbone  amide  group  protons. 


and  the  boxes  at  the  top  indicate  the  positions  of 
the  secondary  structure  elements.  For  a  number  of 
residues  (indicated  by  the  hatched  bars)  close  to 
the  introduced  labels,  the  relaxation  enhancement 
was  too  great  to  be  measured. 

To  facilitate  interpretation  of  the  PRE  results,  we 
have  compared  the  experimental  PRE  profiles  with 
profiles  generated  by  computer  simulation  for:  (i) 
the  native  state;  (ii)  a  model  of  the  denatured  state 
of  protein  L  in  which  local  sequence-structure 
relationships  are  preserved;  and  (iii)  a  random 
chain  model  of  the  protein  L  denatured  state  (see 
Materials  and  Methods).  The  PRE  profile  expected 
for  an  ensemble  of  configurations  without  any  per¬ 
sistent  structure  should  smoothly  decrease  with 
increasing  sequence  distance  from  the  introduced 
probe.  Such  a  smooth  decrease  is  clearly  evident  in 
the  random  chain  simulations  (Figure  7,  row  D). 
The  experimental  PRE  profiles  in  all  five  cysteine 
mutants  (Figure  7,  row  B)  show  the  expected 
decrease  in  relaxation  enhancement  with  distance 
from  the  spin  label  (indicated  by  the  arrows  in 
Figure  6).  Superimposed  on  the  gradual  decay  of 
relaxation  enhancement  effect  are  oscillations  not 
seen  in  the  random  chain  model  that  suggest  the 
presence  of  chain  reversals.  The  differences 


T46 


T17 


Figure  4.  Ribbon  diagram  of  the  protein  L  structure. 
The  positions  where  the  spin  labels  were  introduced  are 
highlighted  in  black. 


between  the  profiles  in  rows  B  and  D  in  Figure  7 
are  likely  to  be  due  to  residual  structure  in  the  pro¬ 
tein  L  denatured  state.  For  example,  for  the  EIC’^ 
mutant,  residues  from  N12  to  T15,  which  are  closer 
to  the  nitroxide  probe  in  the  sequence,  experienced 
less  broadening  than  residues  from  A18  to  F24, 
which  are  further  in  sequence  from  the  labeled  site. 
In  the  S29C*  sample,  residues  from  N12  to  T15  dis¬ 
played  less  broadening  than  residues  from  K5  to 
All.  The  PRE  effect  for  residues  from  T25  to  D41 
in  the  profiles  of  T17C  and  S29C  displays  an  oscil¬ 
lating  pattern,  suggesting  that  helix  or  tum-like 
residual  structures  may  be  present  in  the  region 
from  T25  to  D41,  which  is  helical  in  the  native 
state  of  protein  L.  This  is  consistent  with  the 
chemical  shift  perturbation  in  the  region  noted 
above.  Comparison  of  the  peaks  in  the  experimen¬ 
tal  profile  in  Figure  7  row  B  to  the  profile  expected 
for  the  native  structure  (Figure  7,  row  A)  provides 
an  indication  of  the  extent  to  which  the  residual 
structure  in  the  denatured  state  is  native-like.  The 
peaks  in  the  region  corresponding  to  the  first  (3- 
hairpin  (near  residue  21  in  EIC,  residue  7  in  S29 
and  T46)  in  the  experimental  profiles  mirror  peaks 
in  similar  locations  in  the  profiles  derived  from  the 
native  structure  (row  A).  In  contrast,  the  peak  in 
the  region  corresponding  to  the  second  P-hairpin 
in  the  C63  derivative  (near  residue  51)  has  no 
counterpart  in  the  native  profile.  These  results 
suggest  that  native-like  chain  reversals  are  sampled 
in  the  region  corresponding  to  the  first  P-hairpin  in 
the  denatured  state,  but  in  the  region  correspond¬ 
ing  to  the  second  P-hairpin  there  is  a  chain  rever¬ 
sal,  not  in  the  p-tum,  but  near  the  middle  of  the 
last  strand.  Comparison  of  the  experimental  pro¬ 
files  in  row  B  to  those  for  the  simulated  denatured 
state  model  with  the  local  sequence-structure  pro¬ 
pensities  of  the  protein  L  sequence  (Figure  7,  row 
C)  provides  some  insight  into  the  origin  of  the 
residual  structure  in  the  protein  L  denatured  state, 
A  peak  observed  in  the  experimental  PRE  profile 
in  the  vicinity  of  the  first  hairpin  in  the  EIC  deriva¬ 
tive  is  also  observed  in  the  sequence-specific 
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Figure  5.  ^^N-HSQC  spectra 
of  nitroxide-labeled  T17C/F20W/ 
Y32A  mutant  in  50  mM  sodium 
phosphate  and  2.2  M  guanidine  at 
pH  5.0  and  22  °C.  (a)  Oxidized 
form;  (b)  reduced  form. 


denatured  state  model  simulations  (row  C), 
suggesting  that  the  chain  reversal  in  the  vicinity  of 
the  first  P-tum  is  due,  at  least  in  part,  to  local 
sequence  propensities.  Interestingly,  the  non-native 
peak  observed  around  residue  51  in  the  experimen¬ 
tal  profile  for  the  C63  derivative  is  also  observed  in 
the  sequence-specific  denatured  state  model, 
suggesting  that  this  non-native  feature  is  also  due. 


in  part,  to  local  sequence  propensities.  However, 
much  of  the  residual  structure  suggested  by  the 
experimental  PRE  profiles  is  likely  to  result  from 
interactions  lor^ger  in  range  than  those  captured  by 
the  denatured  state  model,  as  the  oscillations  more 
distant  from  the  site  of  labeling  in  the  experimental 
profiles  in  row  B  are  mostly  absent  from  the  simu¬ 
lated  profiles  in  row  C. 
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sequence 

Figure  6.  Paramagnetic  relaxation  enhancement  of  amide  group  proton  resonances  by  the  introduced  nitroxide  spin 
labels.  The  sites  of  labeling  are  indicated  by  the  open  triangles  on  the  horizontal  axes,  and  also  labeled  in  the  upper 
right  comer  of  each  profile.  The  hatched  bars  represent  relaxation  enhancement  effects  beyond  the  experimentally 
measurable  limit.  The  secondary  structure  of  native  protein  L  is  schematically  represented  on  the  top  of  the  Figure. 


Discussion 

The  NMR  data  presented  here  provide  a  picture 
of  the  conformations  sampled  in  the  denatured 
state  of  the  F20W/Y32A  mutant  in  2  M  guanidine. 


Almost  all  residues  have  chemical  shifts  close  to 
their  random  coil  values,  and  no  long-range  NOEs 
were  observed  even  on  perdeuterated  samples, 
indicating  considerable  conformational  averaging 
and  little  long-range  order.  The  chemical  shift  data. 


T17  S29  T46  C63 


Figure  7.  Comparison  of  experimental  and  simulated  PRE  profiles.  Row  A,  simulated  PRE  profiles  for  the  native  protein  L  structure.  Row  B,  experimental  PRE  profiles. 
Row  C,  simulated  PRE  profiles  for  denatured  state  model  with  protein  L  specific  local  sequence-structure  biases.  Row  D,  simulated  PRE  profiles  for  generic  denatured  state 
model  without  local  sequence-structure  biases.  The  simulation  methods  used  in  rows  A,  C,  and  D  are  described  in  Materials  and  Methods. 
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the  observed  medium-range  NOEs,  and  the  oscil¬ 
lations  in  the  PRE  profiles  suggest  that  the  greatest 
deviations  from  random  chain  behavior  are  in  the 
N-terminal  portion  of  the  protein,  and  that  these 
involve  primarily  native-like  turns  and  chain  rever¬ 
sals.  The  residues  whose  chemical  shifts  change  the 
most  with  increasing  guanidine  concentration  are 
just  before  and  just  after  the  central  helix  (G22,  L38 
and  K39).  The  observed  medium-range  NOEs  were 
in  regions  corresponding  to  the  turn  between  the 
first  two  p-strands  (19  to  G13),  the  turn  between 
the  second  strand  and  the  helix  (A18  to  E25),  and 
the  turn  following  the  helix  (L38  to  G43).  The  most 
pronounced  minima  in  the  PRE  profiles  were 
observed  near  the  location  of  these  turns  in  the 
native  state,  with  the  largest  effect  in  the  first 
p-tum  (the  dip  between  N12  and  T15  is  clearly  evi¬ 
dent  in  both  the  EIC  and  the  S29  labeled  proteins 
(Figure  6)).  The  absence  of  medium-range  NOEs 
suggests  that  the  second  p-tum  is  significantly  less 
populated  than  the  other  turns  in  the  denatured 
state.  The  second  P-tum  containing  three  consecu¬ 
tive  residues  with  positive  ()>  angles,  only  one  of 
which  is  a  glycine  residue  (Wikstrom  et  al,  1994). 
The  turn  is  thus  likely  to  be  under  considerable 
strain  in  the  native  state  and,  perhaps  as  a  conse¬ 
quence  of  this,  the  turn  appears  to  be  largely  dis¬ 
rupted  in  the  rate-limiting  step  in  the  unfolding  of 
protein  L  (Gu  et  aL,  1997).  The  absence  of  detect¬ 
able  NOEs  in  the  second  p-tum  in  the  denatured 
state  could  also  be  due  to  such  strain,  and  for¬ 
mation  of  the  turn  during  folding  may  be  driven 
by  long-range  interactions  not  present  in  the 
denatured  state. 

The  NMR  studies  described  here  are  consistent 
with  the  suggestion  from  earlier  circular  dichron- 
ism  (CD)  and  fluorescence  studies  of  non-random 
residual  stmcture  in  the  denatured  state  in  the 
region  corresponding  to  the  first  hairpin  and  the 
helix  (Scalley  et  aL,  1999).  The  denaturant  depen¬ 
dence  of  the  fluorescence  of  W20  was  found  to  be 
quite  different  in  a  ten-residue  peptide  derived 
from  the  protein  L  sequence  (centered  on  W20) 
from  that  in  the  denatured  protein,  suggesting 
some  residual  stmcture  around  W20  in  the 
denatured  protein.  Consistent  with  this,  we  find 
that  the  highest  density  of  medium-range  NOEs  in 
the  denatured  protein  is  around  W20  (Figure  4). 
The  earlier  CD  studies  suggested  some  residual 
helix  content  in  the  denatured  protein,  and  this  is 
not  inconsistent  with  the  oscillations  in  the  PRE 
profiles  between  T25  and  D41  (this  region  is  helical 
in  the  native  state).  Dead-time  labeling  HD 
exchange  experiments  on  the  denatured  state 
immediately  after  initiation  of  refolding  in  the 
absence  of  denaturant  suggest  that  the  asymmetry 
observed  in  2  M  guanidine  is  also  present  in  the 
absence  of  denaturant:  the  greatest  protection  from 
exchange  was  in  the  first  P-hairpin. 

A  study  of  denatured  proteins  in  urea  and  gua¬ 
nidine  solutions  also  found  little  long-range  order. 

A  study  of  434-repressor  in  6  M  urea  (Neri  et  ai, 
1992),  revealed  a  native-like  local  hydrophobic 


cluster,  with  the  remainder  of  the  protein  largely 
disordered.  In  a  study  of  the  urea  and  guanidine- 
denatured  FK506  binding  protein  (Logan  et  al., 
1994),  significant  populations  of  both  native-like 
and  non-native-like  residual  local  stmctures, 
mainly  turns  and  local  helical  contents,  were 
detected.  More  residual  stmcture  has  been 
observed  in  denatured  states  of  a  truncated  version 
of  staphylococcal  nuclease  (Gillespie  &  Shortle, 
1997)  and  the  drk  SH3  domain  (Mok  et  al,  1998)  in 
the  absence  of  denaturant.  In  the  study  of  the 
unfolded  drk  SH3  domain,  most  of  the  observed 
long-range  interactions  disappeared  upon  the 
addition  of  2  M  guanidine,  but  there  were  still  a 
few  long-range  NOEs  detected  in  2  M  guanidine 
(Mok  et  aL,  1998).  NMR  characterization  of  the 
denatured  states  of  drk  and  staphylococcal  nucle¬ 
ase  in  the  absence  of  denaturant  was  made  poss¬ 
ible  by  their  relatively  high  level  of  solubility; 
unfortunately,  we  have  not  been  able  to  identify 
unfolded  protein  L  mutants  which  are  sufficiently 
soluble  in  the  absence  of  denaturant  for  NMR  stu¬ 
dies.  The  origins  of  the  differences  in  solubility  of 
the  denatured  states  of  different  proteins  are  not 
entirely  clear;  the  high  level  of  solubility  of  the  sta¬ 
phylococcal  nuclease  denatured  state  may,  in  part, 
be  due  to  electrostatic  repulsion  between  mono¬ 
mers  resulting  from  the  high  net  charge  on  the  pro¬ 
tein. 

An  issue  of  considerable  current  interest  is  how 
residual  stmcture  in  the  unfolded  state  contributes 
to  the  overall  folding  reaction.  On  the  one  hand, 
native-like  interactions  in  the  denatured  states  can 
limit  the  conformational  search  space  during  fold- 
ing  and,  on  the  other,  non-native  like  interactions 
could  create  energy  barriers  that  hamper  protein 
folding.  For  example,  in  the  case  of  the  drk  SH3 
domain,  non-native  interactions  in  the  unfolded 
state  destabilize  the  native  state  and  may  slow  the 
folding  process.  Extensive  characterization  of  the 
effects  of  mutations  on  protein  L  folding  (Kim  et  aL, 
1999)  has  suggested  that  the  first  P-tum  is  largely 
formed,  and  the  second  P-tum,  largely  disrupted 
in  the  folding  transition  state  ensemble.  Here,  we 
observe  medium-range  NOEs  and  a  significant  dip 
in  the  PRE  profile  in  the  region  corresponding  to 
the  first  p-tum,  but  not  the  second,  consistent  with 
the  asymmetry  of  stmcture  in  the  folding  transition 
state.  Thus,  the  interactions  formed  in  the  protein 
L  denatured  state  do  not  appear  to  disfavor  sub¬ 
sequent  folding  events,  and  the  asymmetry  in'  pro¬ 
tein  L  folding  appears  quite  early  in  the  folding 
process.  The  rate-limiting  step  in  folding  may 
involve  the  entropically  costly  consolidation  and/ 
or  association  of  parts  of  the  protein  which  are  par¬ 
tially  ordered  in  the  denatured  state. 

Materials  and  Methods 

Sample  preparation 

^^N-labeled  and  ‘'^C-Iabeled  protein  samples  were 
made  by  growing  the  transformed  Escherichia  coli  cells  in 
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Mops  minimal  medium.  For  the  labeling,  99.9%  (w/w) 
^^NH4Ci  was  used  as  the  nitrogen  source  and  99.9% 
(w/w)  ^^C-glucose  as  the  carbon  source  in  our  media. 
For  ^H-labeled  F20W/Y32A,  a  single  colony  of 
£.  coli  of  BL21  (DE3/plysS)  carrying  the  F20W/Y32A 
plasmid  was  inoculated  into  100  ml  of  M9  H2O  media 
with  50  )ig/ml  carbenicillin,  and  grown  at  37  °C  until  the 
absorbance  at  600  nm  was  approximately  0.6.  The  cells 
were  harvested  by  centrifuging  the  culture  with  an  ss-34 
rotor  at  4000  rpm  for  ten  minutes.  The  harvested  cells 
were  resuspended  with  10  ml  of  M9  ^H20  medium  and 
then  transferred  into  2  I  of  M9  ^H20  medium  with 
50  pig/ ml  carbenicillin.  The  cells  were  grown  at  37°C  to 
an  absorbance  at  600  nm  of  approximately  0.6,  then 
induced  with  1  mM  IPTG  for  ten  hours  before  harvest¬ 
ing.  All  labeled  F20W/Y32A  proteins  were  purified  by 
His-tag  affinity  column  as  described  (Gu  et  al,,  1995).  For 
the  ^H-labeled  F20W/Y32A,  the  level  of  deuteration 
was  estimated  to  greater  than  95%  based  on  the  result 
from  mass  spectrometry, 

Nltroxide  labeling 

Nitroxide  groups  can  be  introduced  into  proteins 
through  alkylation  of  the  thiolate  group  of  cysteine  resi¬ 
due.  Since  protein  L  does  not  contain  any  cysteine  resi¬ 
dues,  site-directed  mutagenesis  was  used  to  substitute 
selected  amino  acid  residues  with  cysteine.  Five  charged 
or  polar  residues  with  highly  solvent-exposed  side- 
chains  were  selected  for  cysteine  mutation  to  minimize 
possible  conformational  perturbations  of  mutagenesis. 
All  the  mutants,  EIC,  T17C,  S29C,  T46C  and  +63C 
(addition  of  a  Cys  residue  to  the  C  terminus),  were  spin 
labeled  as  described  (Mchaourab  et  al,  1996).  The  extent 
of  labeling  was  examined  by  MALDI  and  in  all  cases  it 
was  greater  than  95  %. 

NMR  experiments  and  data  processing 

Both  HNCACB  (Wittekind  &  Mueller,  1993)  and 
CBCACONH  (Grzesiek  &  Bax,  1992,  1993)  triple  reson¬ 
ance  experiments  were  carried  out  on  a  Bruker  DMX500 
instrument  with  1.5  mM  ‘^N,  ^^C-Iabeled  F20W/Y32A  in 
10%  ^H20/90%  H2O  50  mM  sodium  phosphate  and 
2.2  M  guanidine  at  pH  5.0  and  22  “C.  Matrices  of 
40  X  40  X  512  complex  points  were  acquired  with  spec¬ 
tral  widths  of  7002.8,  2000.0,  and  4496.4  Hz  (FI,  F2,  F3) 
for  both  HNCACB  and  CBCACONH.  For  both  '^C  and 
^'"’N  dimensions  (FI,  F2)  of  these  spectra,  the  sizes  of  the 
time  domain  were  doubled  via  forward -back  linear  pre¬ 
diction  (Zhu  &  Bax,  1992).  The  data  were  zero-filled  and 
extracted  (only  4.70  ppm-9.00  ppm  of  the  acquisition 
dimension  was  retained)  to  give  final  3D  data  sets  of 
1024  X  80  X  80  real  points.  The  guanidine  titration  was 
carried  out  using  ‘-“^N-labeled  F20W/Y32A/C63  protein; 
the  HSQC  spectrum  of  this  protein,  which  was  readily 
purified  in  large  amounts,  is  nearly  identical  with  that  of 
F20W/Y32A.  HSQC  experiments  were  carried  out  on 
samples  equilibrated  with  50  mM  sodium  phosphate 
and  guanidine  concentrations  of  1.5  M,  2.0  M,  2.5  M, 
3.0  M,  3.5  M,  4.0  M  and  5.0  M  at  pH  5.0  and  22  C. 

The  ^H,  ^-'"N-HSQC-NOESY-HSQC  (Zhang  ct  al,  1997) 
experiment  was  performed  on  a  three-channel  Varian 
Inova  500  MHz  spectrometer  with  a  1.2  mM  perdeuter- 
ated  -H,  ’^Nq^beied  F20W/Y32A  in  10%  “H2O/90% 
HoO  50  mM  sodium  phosphate  and  2.2  M  guanidine  at 
pH  5.0  and  5  C.  The  HSQC  spectrum  of  F20W/Y32A 
under  2.2  M  guanidine  at  pH  5.0  and  5"C  was  almost 


identical  with  that  ixnder  the  same  solvent  conditions  at 
22  °C,  suggesting  that  the  population  of  unfolded  F20W/ 
Y32A  was  not  changed  significantly  by  varying  the  tem¬ 
perature  between  22  °C  and  5'^C.  A  matrix  of 
64  X  32  X  512  complex  points  was  acquired  with  spectral 
widths  of  1500.0,  1500.0  and  9000.9  Hz  (FI,  F2  and  F3) 
using  a  mixing  time  of  600  ms  and  a  recycle  delay  of 
1.9  s.  Some  24  scans  were  acquired  for  each  FID.  For 
both  ^^N  dimensions  (FI  and  F2),  the  sizes  of  the  time 
domain  were  doubled  via  forward-backward  linear  pre¬ 
diction.  The  data  were  apodized  with  a  65°-shifted 
squared  sine-bell  in  all  three  dimensions,  zero-filled  and 
extracted  (only  4.70  ppm  to  11.0  ppm  was  retained)  to 
give  a  final  3D  data  set  of  512  x  256  x  128  real  points. 
All  the  NMR  spectra  were  processed  using  the  NMRPipe 
software  system  (Delaglio  et  al,  1995). 

To  measure  the  paramagnetic  relaxation  enhancement 
due  to  the  introduced  nitroxide  spin  labels,  ^H,  ^^N- 
HSQC  spectra  were  collected  using  the  pulse  sequence 
described  (Kay  et  al,  1992)  on  0.5  to  1.0  mM  protein 
samples  at  pH  5.0,  22  °C  before  and  after  reduction  by  a 
threefold  molar  excess  of  ascorbic  acid  in  a  5  lil  volume. 
All  the  spectra  were  apodized  with  a  54°-shifted  squared 
sine-bell  in  both  dimensions  and  zero-filled.  The  intensi¬ 
ties  of  peaks  in  the  HSQC  spectra  of  both  the  oxidized 
and  the  reduced  forms  were  measured.  The  effects  of 
paramagnetic  enhancement  were  determined  by  spectral 
simulation  as  described  below  (the  details  are  described 
by  Gillespie  &  Shortle,  1997).  First,  the  PRE  effects  on  a 
set  of  Loren tzian  peaks  were  simulated  by  multiplying 
their  FIDs  with  an  exponential  window  function  using 
varying  amounts  of  line  broadening.  This  simulates  the 
transverse  relaxation  of  the  amide  group  protons  begin¬ 
ning  with  the  first  90^  pulse  of  HSQC  experiment.  To 
simulate  relaxation  during  the  pulse  sequence  of  the 
HSQC  experiment  prior  to  signal  acquisition,  the  first 
18  ms  of  the  FID  was  discarded  because  the  HSQC  pulse 
sequence  used  in  this  study  involves  a  total  of  18  ms  of 
fixed  delay  at  which  the  amide  group  proton  magnetiza¬ 
tion  resides  in  the  transverse  plane  prior  to  signal 
acquisition.  The  remainder  of  the  FID  was  Fourier- 
transformed  after  apodizing  with  a  54 ’-shifted  squared 
sine-bell  and  zero-filled.  Since  the  decrease  in  peak  inten¬ 
sity  depends  on  both  the  initial  linewidths  and  the  time 
constant  of  the  exponential  window  function,  two  sets  of 
simulation  curves  (basically,  a  plot  of  relative  intensity 
versus  broadening  in  Hz  )  were  obtained  for  resonances 
with  linewidths  corresponding  to  those  measured  in 
F20W/Y32A,  namely  15  and  20  Hz.  Based  on  these 
simulation  curves,  the  amount  of  line  broadening  corre¬ 
sponding  to  the  experimentally  measured  intensity  ratio 
of  the  oxidized  versus  the  reduced  forms  was  considered 
to  be  the  paramagnetic  relaxation  enhancement  (y-axis  in 
Figure  6). 

Simulation  of  the  PRE  effects 

For  the  native  state  of  protein  L,  PRE  data  were  simu¬ 
lated  directly  from  the  protein  structure  according  to  the 
Soloman-Bloembergen  equations.  Interaction  distances 
were  measured  between  each  pair  of  alpha-carbon  and 
backbone  nitrogen  atoms  in  the  wild- type  structure, 
interaction  strengths  were  taken  to  be  proportional  to 
the  reciprocal  of  the  sixth  power  of  that  distance,  and 
distances  closer  than  6  A  were  truncated  to  6  A.  It 
should  be  noted  that  the  actual  position  of  the  free  elec¬ 
tron  is  some  distance  from  the  alpha-carbon  atom,  and 
thus  the  simulated  spectra  are  not  expected  to  match  the 
experimental  spectra  at  very  short  sequence  separations 
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(the  lack  of  oscillation  in  the  helix  region  in  the  simu¬ 
lated  native  spectrum,  for  example,  is  because  all  alpha- 
carbon-amide  nitrogen  pairs  separated  by  less  than  four 
residues  in  the  helix  are  separated  by  less  than  6  A).  For 
simulations  of  disordered  protein  L,  two  different 
models  were  used  to  create  representative  ensembles  of 
disordered  structures,  and  the  ensemble  average  signal 
between  each  pair  of  residues  was  calculated.  First,  an 
ensemble  of  structures  built  from  unrelated  protein  frag¬ 
ments  was  used  as  a  generic  model  of  the  states  accessi¬ 
ble  to  a  disordered  protein  chain.  Each  structure  was 
assembled  by  ligating  three  residue  fragments  picked  at 
random  from  a  155,000-residue  database  of  protein  struc¬ 
tures  in  which  each  entry  had  less  than  40%  sequence 
homology  to  all  others.  The  geometry  of  each  fragment 
in  the  assembled  structure  was  determined  by  its  (j),  \|/, 
and  ca  angles  in  the  database  using  a  set  of  ideal  bond 
angles  and  lengths  (Engh  et  aL,  1991;  the  torsion  angles 
were  optimized  to  reproduce  the  native  structures  using 
the  ideal  bond  lengths  and  angles  (Simons  et  aL,  1997)). 
In  the  buildup  procedure,  side-chains  were  approxi¬ 
mated  with  centroids.  Structures  with  severe  steric 
clashes  were  discarded.  Second,  to  model  a  disordered 
chain  with  local  interactions  favored  by  the  protein  L 
sequence,  an  ensemble  of  structures  was  assembled  as 
for  the  generic  sequence  model,  but  only  three  residue 
fragments  (from  the  same  database)  with  sequence  iden¬ 
tity  to  protein  L  were  used.  There  were  about  30-50 
different  fragments  in  the  database  with  the  correct 
sequence  for  each  position  in  the  protein.  Local  steric 
clashes  were  reduced  by  requiring  that  the  four  residues 
overlapping  each  junction  between  two  fragments  were 
represented  by  a  four-residue  fragment  of  similar 
sequence  and  structure  in  the  database.  Structures  with 
severe  steric  clashes  were  discarded. 
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Abstract 

As  a  step  toward  selecting  folded  proteins  from  libraries  of  randomized  sequences,  we  have  designed  a  ‘loop 
entropy  reduction’-based  phage-display  method.  The  basic  premise  is  that  insertion  of  a  long  disordered 
sequence  into  a  loop  of  a  host  protein  will  substantially  destabilize  the  host  because  of  the  entropic  cost  of 
closing  a  loop  in  a  disordered  chain.  If  the  inserted  sequence  spontaneously  folds  into  a  stable  structure  with 
the  N  and  C  termini  close  in  space,  however,  this  entropic  cost  is  diminished.  The  host  protein  function  can, 
therefore,  be  used  to  select  folded  inserted  sequences  without  relying  on  specific  properties  of  the  inserted 
sequence.  This  principle  is  tested  using  the  IgG  binding  domain  of  protein  L  and  the  Ick  SH2  domain  as  host 
proteins.  The  results  indicate  that  the  loop  entropy  reduction  screen  is  capable  of  discriminating  folded  from 
unfolded  sequences  when  the  proper  host  protein  and  insertion  point  are  chosen. 

Keywords:  Phage-display;  insertion;  protein  folding;  protein  evolution;  chimeric  proteins 


What  fraction  of  the  vast  number  of  possible  polypeptide 
sequences  are  able  to  form  a  defined  three-dimensional 
structure  analogous  to  the  folded  states  of  natural  proteins? 
An  experimental  answer  to  this  fundamental  question  could, 
in  principle,  be  obtained  by  an  examination  of  the  properties 
of  a  large  number  of  randomly  generated  polypeptide  se¬ 
quences.  We  have  developed  a  ‘loop  entropy  reduction’ 
screen  to  be  used  for  selecting  folded  proteins  from  large 
collections  of  natural  or  artificial  coding  sequences. 

Recent  studies  have  shown  that  increasing  the  length  of 
loop  regions  in  protein  structures  typically  results  in  a  de¬ 
crease  in  overall  stability  of  the  protein.  The  observed  de- 
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crease  in  stability  has  been  linked  to  the  entropic  cost  of 
ordering  the  additional  residues  in  the  loop  (Ladumer  1997; 
Nagi  1997).  These  data  suggest  that  if  the  unfolded  inserted 
sequence  is  of  sufficient  length,  it  may  disrupt  the  folding  of 
the  host  protein  into  which  it  is  inserted.  However,  if  the 
inserted  sequence  is  capable  of  folding  into  an  independent 
structure,  the  entropic  cost  to  the  host  protein  will  be  mini¬ 
mal  and  the  host  protein  may  retain  its  ability  to  fold  (Betton 
et  al.  1997;  Collinet  2000).  The  functional  integrity  of  the 
host  protein  is  directly  related  to  the  conformational  state  of 
the  inserted  sequence,  allowing  the  properties  of  the  host 
protein  to  be  used  to  select  for  folded  inserted  sequences. 

Several  questions  must  be  addressed  in  the  design  of  the 
loop  entropy  reduction  screen:  What  protein  should  serve  as 
the  host  protein?  Into  which  loop  should  the  sequences  be 
inserted?  How  will  the  folded  state  of  the  host  protein  be 
evaluated?  In  this  study,  we  have  chosen  to  use  the  IgG- 
binding  domain  of  protein  L  and  the  Ick  SH2  domain  as  host 
proteins.  These  proteins  were  selected  because  of  the  large 
amount  of  structural  information  available:  High-resolution 
structures  and  thermodynamic  stabilities  have  been  deter- 
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mined  for  both  proteins  (Wikstrom  et  al.  1993;  Tong  et  al. 
1996;  Scalley  et  al.  1997).  A  phage-display  format  was 
chosen  as  a  screening  method  because  it  has  proven  to  be 
successful  in  selecting  rare  folded  variants  within  a  collec¬ 
tion  of  highly  randomized  domains  (Zhou  et  al.  1996; 
Riddle  et  al.  1997;  Kim  et  al.  1998).  Furthermore,  the  re¬ 
sistance  of  filamentous  phage  is  compatible  with  the  pres¬ 
ence  of  destabilizing  conditions  such  as  high  temperature  or 
denaturants  during  the  selection  procedure,  providing  a 
method  for  adjusting  the  level  of  selection  pressure  (Kns- 
tensen  1998;  Forrer  et  al.  1999;  Jung  et  al.  1999). 


Results 

Initially,  the  IgG-binding  domain  of  peptostreptococcal  pro¬ 
tein  L  was  selected  as  a  host  protein.  Two  turns  were  chosen 
as  points  for  insertion  (Figure  1);  the  turn  leading  from  the 
second  p-strand  into  the  a-helix,  ^2'”^’  turn  leading 

from  the  third  p-strand  into  the  fourth  P-strand,  P3-P4-  To 
determine  whether  either  or  both  insertion  points  could 
withstand  the  insertion  of  a  folded  sequence,  preliminary 
experiments  were  performed  using  wild-type  src  SH3  as  a 
model  of  a  folded  inserted  sequence.  The  SH3  domain  cod¬ 
ing  region  was  introduced  into  the  protein  L  DNA  sequence 
via  two  unique  restriction  sites  {Ecd^llNdel  for  Pa-ot,  and 
SalilKpril  for  P3-P4)  in  the  protein  L-gene  VIII  fusion  con¬ 
struct  (Gu  et  al.  1995).  The  amino  acid  sequences;  of  the 
regions  bordering  the  SH3  insertion  are  shown  in  Table  1.  It 
is  important  to  note  that  N  and  C  termini  of  SH3  domains 
are  close  in  space  to  one  another,  a  property  that  is  likely  to 
be  preferred  by  the  screening  method. 

Phage  displaying  the  chimeric  proteins  were  made  and 
screened  for  their  ability  to  bind  paramagnetic  beads  coated 
with  IgG,  which  protein  L  binds  to  with  high  affinity  (Kihl- 
berg  et  al.  1996).  The  results  show  clearly  that  the  turns 
cannot  tolerate  the  insertion  of  SH3  equally  (Table  2). 
Phage  displaying  SH3  inserted  into  the  P2-<it  turn  were  re¬ 
covered  at  very  low  levels,  whereas  phage  displaying  SH3 
inserted  into  the  P3-p4  turn  of  protein  L  were  recovered  at 
levels  similar  to  that  of  phage  displaying  wild-type  pro¬ 
tein  L.  Recently  it  was  found  that  the  residues  packed  at  the 
P2-a  interface  make  important  contributions  to  IgG  bind¬ 
ing,  thus,  the  low  recovery  levels  may  reflect  a  disruption  in 
IgG  binding  resulting  from  the  proximity  of  the  SH3  inser¬ 
tion  (H.  Svensson,  pers.  comm.).  Because  the  P3-P4  turn 
tolerates  insertion  of  a  folded  protein,  this  mm  was  used  for 
all  subsequent  insertions  into  protein  L. 

A  library  of  random  sequences  coding  for  60-300  amino 
acids  was  constructed  and  inserted  into  the  ^3— P4  turn  of 
protein  L  (see  Materials  and  Methods).  The  library  members 
that  retained  protein  L  function  were  isolated  using  IgG- 
coated  beads.  Several  positives  were  identified  by  the 
screen,  but  initial  examination  of  their  sequences  showed 


j33_p4  p2-a 


Fig.  1.  Structure  of  the  host  and  inserted  domains.  The  surface  loops  m 
protein  L  and  SH2  that  are  used  for  insertion  are  indicated  by  the  arrows. 
The  position  of  the  A8G  mutation  in  protein  L  is  highlighted  on  the  struc¬ 
ture.  Leu32  is  mutated  to  a  glutamate  in  the  destabilized  form  of  the  SH3 
domain.  Images  were  made  using  Raster  3D  (Bacon  and  Anderson  1988, 
Merrit  and  Murphy  1994)  and  Molscript  (Kraulis  1991). 


them  to  be  highly  polar  and,  thus,  unlikely  to  fold  into  stable 
independent  structures.  One  sequence  identified  by  the 
screen,  B1 1,  was  chosen  for  further  analysis  (see  the  legend 
to  Table  1  for  B 1 1  sequence).  Phage  displaying  the  pL[B  11] 
chimera  were  panned  against  IgG-coated  beads  to  ensure 
the  chimera  exhibited  binding  activity  (Table  2).  The 
pL[Bll]  chimera  was  recovered  at  levels  similar  to  that  of 
wild-type  protein  L  and  the  pL[SH3]  chimera. 

To  examine  the  effect  of  folded  and  unfolded  inserted 
sequences  on  the  stability  of  protein  L,  the  SH3  and  B 1 1 
chimeric  proteins,  pL[SH3]  and  pL[Bll],  were  overex¬ 
pressed  and  purified.  The  stabilities  of  pL[SH3]  and 
pL[Bll]  were  measured  using  guanidine  denaturation  (Fig. 
2;  see  Materials  and  Methods).  Both  pL[SH3]  and  pL[Bl  1] 
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Table  1.  Phagemid  constructs 


Host 


Insert 


Ph32-a 

— 

pl-32-a 

SH3^, 

Ph|33-P4 

— 

Php3-34 

SH3^, 

pL33-p4 

Bll 

SH2 

— 

SH2 

SH3^, 

SH2 

Bll 

SH2 

SH3l32E 

Sequence 


T,9A2o[E2lF22^23G24’r25p26E27K28A29"f’3oS3l^32^33Y34A35]Y36A37 

T,9A2„[E2,F22K23G24T25F26CT,F,oA,,L,j...(SH3)...A«2Pft3S«4Ds5CE27K28A29T3oS3|E32A33Y34A35]Y36A37 

T48V49tD5oV5,A52D53K54G33]T*Y5«T57L58 

T4gV49[E*5oV 5 1 A52D53K545GT9F,9A , ,Li2.. .{SH3) . . •  A52P63S(i4D65G33]T  Y5^T37  53 

T43V4,[E*3oV5  I  A52Dj3K545GR,S2P3A4-  "(B  ID”  •l58D59T6oG6lS62Gs5]T*  Y5ST57L58 

Rl68Dl69V*i7o[Dl7lQl72Nn3Ql74G|75F*176'^l77Vl78 

Ri68Di69V*17o[£YA£)A:SGT9FioA,iL,2...(SH3)...A82P«3S64D65G,751T*,76V,77V|78 

R,88D,89V*|7o[£VADA:SGR,S2P3A4—(B1D-Is8D59T4«Gs,S82G,75]T*176'Vi77'Vi7S  ' 
Rl68Dl69V*,7o[EVA£>K5GT,;FloA,iL,2...(SH3)...E*32.”A82Ps3S64D65Gl75]'r*l76Vl77Vl78 


The  sequence  of  the  inserted  protein  and  the  linker  region  are  typed  in  bold  and  italics,  respectively.  The  position  corresponding  to  tfie  restriction  site 
(EcoRI/Ndel)  for  pLao^,  Sall/Kpnl  for  SH2  and  pLa3.a4  4)  are  in  brackets.  Introduction  of  the  Kpnl  site  into  protein  L  results  in  the  insertion  of  a  threonine 
r^due  between  G33  and  Yjg.  Two  point  mutations,  F170V  and  E176T.  were  introduced  into  SH2  to  create  the  Sail  and  Kpnl  restnction  si'e*  “d^e 
indicated  with  ».  Control  experiments  have  shown  that  these  mutations  do  not  impact  the  binding  activity  of  phage  displaying  protein  L  of  SH2.  The 
sequence  of  Bll  follows:  RSPAQVVDAQQNAVKDNEPSGSALGGRSAPGATRPSDQSGGSEDRSVPTEKPKEGPHID.  Sequence  numbenng  systems 

for  SH2  and  SH3  are  described  in  Tong  et  al.  (1996)  and  Riddle  et  al.  (1997),  respectively. 


exhibited  a  cooperative  and  reversible  folding  transition 
with  m-values  (the  denaturant  dependence  of  the  free  energy 
of  folding)  similar  to  that  of  wild-type  protein  L  (wild-type 
protein  L:  m  =  1.8;  pL[SH3];  m  =  1.7;  pL[Bll]: 
m  =  1.8).  The  free  energy  of  unfolding  for  pL[SH3]  was 
slightly  reduced  in  comparison  with  wild-type  protein  L, 
whereas  that  of  pL[Bll]  was  drastically  decreased  (wild- 
type  protein  L;  AG  =  4.6  kcal/mol;  pL[SH3]:  AG  =  3.4 
kcal/mol;  pL[Bll];  AG  =  1.0  kcal/mol).  These  data  con¬ 
firm  our  assumption  that  the  insertion  of  an  unfplded  se¬ 
quence  into  protein  L  results  in  a  large  decrease  in  stability, 
whereas  the  insertion  of  a  folded  sequence  results  in  mini¬ 
mal  stability  loss. 

The  equilibrium  denaturation  data  suggest  that  if  protein 
L  were  destabilized  by  1-2  kcal/mol,  the  folding  of  the 
pL[Bll]  chimera,  but  not  that  of  the  pL[SH3]  chimera, 
would  be  disrupted,  thereby  decreasing  the  permissiveness 
of  protein  L  with  respect  to  loop  insertions.  To  destabilize 
protein  L,  panning  experiments  with  phage  displaying  wild- 
type  protein  L,  pL[SH3],  and  pL[Bll]  were  performed  in  1 
M  guanidine  (Table  2).  In  comparison  with  panning  in  the 


Table  2.  Panning  recoveries  for  protein  L  experiments 


Host 

Insert 

%  Recovery 

OM  gnd 

%  Recovery 
IM  gnd 

pL 

none 

1.0  X  10'^ 

2.4  X  10^^ 

Ph32-a 

pLp3-p4 

pL 

SH3 

SH3 

Bll 

6.0  X  10"^ 

2.0  X  10-‘ 

5.0  X  10'^ 

1.6  X  10"^ 

3.6  X  10-* 

phASG 

none 

9.0  X  10"^ 

1.2  X  10'^ 

Pi-ASG 

SH3 

5.0  X  10"^ 

8.4  X  10'^ 

pl-ASG 

Bll 

8.2  X  10"^ 

0 

Control 

n/a 

1.2  X  10-^ 

1.2  X  10"^ 

5  X  10^  c.f.u.  of  freshly  prepared  phage  were  used  as  input.  %  recovery  is 
calculated  as  100  x  (c.f.u.  input/c.f.u.  output).  Control  experiments  corre¬ 
spond  to  phage  displaying  the  SH2  domain  rather  than  protein  L. 


absence  of  guanidine,  an  approximate  10-fold  loss  in  recov¬ 
ery  was  observed  for  wild-type  protein  L  and  pL[SH3] 
phage  whereas  a  100-fold  loss  in  recovery  was  observed  for 
pL[Bll]  phage.  However,  the  recovery  of  pL[Bll]  was  still 
10-fold  above  background  recovery  levels. 

We  then  destabilized  protein  L  by  mutating  residue  A8  to 
a  glycine  (Fig.  1).  This  mutation  was  chosen  because  it 
destabilizes  the  protein  by  2.4  kcal/mol  and  preliminary 
studies  suggest  that  it  does  not  interfere  with  IgG  binding 
(Kim  et  al.  2000).  This  strategy  of  destabilizing  the  host  by 
mutagenesis  was  used  successfully  in  phage-display  experi¬ 
ments  using  protein  G,  a  small  single-domain  protein  with  a 
topology  identical  to  protein  L:  A  wild-type  protein  G  host 
tolerated  50%  of  randomized  turn  sequences  whereas  a  de¬ 
stabilized  protein  G  host  tolerated  only  a  small  fraction  of 
the  randomized  sequences  (Zhou  et  al.  1996).  Both  Bll  and 
SH3  were  inserted  into  the  P3-I34  turn  of  the  A8G  point 
mutant  and  subjected  to  phage  panning  experiments.  The 
results  were  similar  to  those  observed  for  the  wild-type 
protein  L  chimera  panned  in  the  presence  of  guanidine; 
compared  with  wild-type  protein  L,  pL^so 
phage  experienced  a  10-fold  loss  in  recovery  and  pLasg 
[Bll]  phage  experienced  a  100-fold  loss  in  recovery.  Be¬ 
cause  the  recovery  levels  of  pLasg  phage  were  still 

10-fold  above  background  levels,  additional  panning  ex¬ 
periments  were  performed  in  the  presence  of  1  M  guanidine 
(Table  2).  Both  pLa8g[SH3]  and  pLa8g[B11]  were  recov¬ 
ered  at  very  low  levels,  however,  indicating  that  these  con¬ 
ditions  are  too  stringent  for  the  pLa8g  chimeric  proteins. 

In  an  effort  to  find  a  less  permissive  host  protein,  we 
turned  to  the  Ick  SH2  domain.  A  surface  loop  in  SH2  (Fig. 
1)  was  chosen  as  the  insertion  point  because  of  its  central 
location  in  the  molecule  and  its  lack  of  involvement  in 
ligand  binding.  To  test  the  efficacy  of  SH2  as  a  host  protein, 
wild-type  SH3,  Bll,  and,  as  an  additional  model  for  an 
unfolded  sequence,  a  strongly  destabilized  SH3  point  mu- 
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Fig.  2.  Equilibrium  denaturation  melts  of  wild-type  protein  L,  pL[SH3],  and  pL[BU].  Denaturation  curves  of  wild-type  protein  L 
(open  circles),  pL[SH3]  O.(open  triangles),  and  pL[BIl]  (crosses)  were  monitored  by  circular  dichroism  at  220  nm.  The  data  were  fit 
as  described  by  Scalley  et  al.  (1997).  Protein  concentrations  were  10  ±  0.2  p.M  for  wild-type  protein  L,  8  (aM  ±  0.2  p,M  for  pL[Bll], 
and  5  |xM  ±  0.2  jiM  for  pL[SH3], 


tant,  SH3l32E7  each  inserted  into  the  SH2  domain. 
Insertions  were  introduced  into  SH2  in  a  manner  similar  to 
protein  L,  using  two  unique  restriction  sites  {SaUJKpnY)  in- 
troduced  into  the  SH2-gene  Vm  fusion  construct.  The 
amino  acid  sequences  of  the  region  bordering  the  insertions 
of  wild-type  SH3,  SH3l32e.  and  B11  into  the  SH2  domain 
are  shown  in  Table  1. 

To  investigate  the  effects  of  the  different  insertions  on 
SH2  function,  phage  displaying  the  SH2  insertion  variants 
were  panned  using  paramagnetic  beads  coated  with  SH2 
ligand,  a  phosphotyrosyl  peptide.  The  recovery  of  the  dif¬ 
ferent  phage  is  reported  in  Table  3.  The  results  show  that  the 
chimeric  protein  into  which  SH3  sequence  was  inserted  is 
recovered  with  an  efficiency  on  the  same  order  of  magni¬ 
tude  as  the  host  protein  without  any  insertion.  The  recovery 


Table  3.  Panning  recoveries  for  SH2  experiments 


Host 

Insert 

%  Recovery 

SH2 

none 

1.0  X  10“- 

SH2 

SH3 

3.0  X  10-^ 

SH2 

SHl32e 

1.2  X  10-’ 

SH2 

Bll 

0 

Control 

n/a 

2.4  X  lO'^ 

5  X  10^  c.f.u.  of  freshly  prepared  phage  were  used  as  input.  Control  ex¬ 
periments  correspond  to  phage  displaying  protein  L  rather  than  the  SH2 
domcun. 


is  reduced  to  background  levels  when  either  SH3l32e 
B 1 1  is  inserted  into  the  host  protein.  These  results  indicate 
that  the  selection  of  phage  displaying  chimeric  functional 
SH2  proteins  can  efficiently  discriminate  between  folded 
and  unfolded  sequences  without  relying  on  any  biological 
function  of  the  inserted  sequence. 

Discussion 

The  results  of  this  study  show  that  a  screen  based  on  loop 
entropy  reduction  is  capable  of  discriminating  between 
folded  and  unfolded  sequences.  We  have  also  shown  that 
choosing  the  correct  host  protein  and  the  position  of  inser¬ 
tion  are  critical  to  the  success  of  the  screen.  For  example, 
protein  L  will  allow  the  insertion  of  a  folded  SH3  domain  in 
the  ^3-^4  turn  but  not  the  turn.  However,  the  33-P4 
turn  also  accommodates  the  insertion  of  unfolded  se¬ 
quences,  as  seen  by  the  retention  of  protein  L  function  in  the 
pL[Bll]  chimera. 

By  destabilizing  protein  L,  we  were  able  to  reduce  the 
recovery  of  the  unfolded  insert  [Bll]  in  comparison  with 
the  folded  insert  [SH3],  although  its  recovery  could  not  be 
reduced  to  background  levels.  The  close  proximity  of  the 
(33-^4  insertion  site  to  the  C  terminus  may  contribute  to  the 
permissiveness  of  protein  L  because  only  a  few  residues  are 
needed  to  complete  the  native  structure,  and  a  suitable  re¬ 
placement  sequence  may  be  found  in  the  randomized  insert. 
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In  such  a  scenario,  the  remainder  of  the  loop  and  the  original 
C  terminus  would  be  extruded  into  the  linker  between  pro¬ 
tein  L  and  gene  VIII. 

The  underlying  idea  behind  the  loop  entropy  reduction 
screen  is  theoretically  applicable  to  any  host  protein  if  a 
phenotypic  screen  related  to  the  functional  integrity  of  the 
host  protein  is  available.  A  similar  approach  was  employed 
using  Escherichia  coli  RNase  HI  as  a  host  protein  (Doi  et 
al.  1997,  1998).  In  this  study,  random  sequences  were  in¬ 
serted  into  a  surface  loop  of  RNase  HI,  and  chimeric  pro¬ 
teins  that  retained  RNase  HI  function  were  selected  using 
an  in  vivo  assay.  It  was  found  that  the  inserted  sequences 
that  were  folded  maintained  their  structure  on  excision  from 
the  chimera.  However,  structural  characterization  of  the  chi¬ 
meric  proteins  that  came  through  the  screen  demonstrated 
that  the  inserted  sequences  were  not  always  folded;  three 
out  of  five  of  the  chimeric  proteins  characterized  were 
found  to  have  unfolded  inserted  sequences.  We  observed  a 
similar  permissiveness  with  protein  L  as  the  host  protein, 
providing  further  evidence  that  not  all  host  proteins  are 
equally  good  at  discriminating  between  folded  and  unfolded 
inserts. 

Other  studies  that  have  probed  random  sequence  libraries 
for  folded  proteins  have  relied  on  expression  of  the  random 
sequence  as  a  screening  method.  Davidson  and  coworkers 
(Davidson  and  Sauer  1994;  Davidson  et  al.  1995)  con¬ 
structed  a  library  of  random  sequences  consisting  of  three 
amino  acids  (Q,  L,  R).  Interestingly,  the  fraction  of  se¬ 
quences  containing  some  degree  of  structure  was  large 
enough  to  allow  detection  of  folded  proteins  using  a  screen¬ 
ing  method  based  on  expression  and  solubility  of  protein 
from  individual  clones.  In  an  extension  of  this  work,  Pri- 
jambada  and  coworkers  (1996)  constructed  random  se¬ 
quence  libraries  containing  all  20  amino  acids.  In  that  study, 
the  investigators  concluded  that  8%  of  the  random  se¬ 
quences  were  expressed  and  soluble,  but  no  proteins  with 
extensive  secondary  structure  were  found.  To  extend  this 
work  further,  it  is  necessary  to  examine  a  larger  number  of 
sequences  than  is  possible  with  expression-based  screening 
methods. 

In  this  study,  we  have  employed  a  phage-display  screen¬ 
ing  method  that  has  several  advantages  over  expression- 
based  screening  methods.  First,  phage-display  techniques 
allows  examination  of  many  more  sequences  than  is  pos¬ 
sible  in  an  expression-based  screen.  Second,  phage  display 
has  been  proven  successful  in  selecting  rare  folded  variants 
within  a  collection  of  highly  randomized  domains  (Zhou  et 
al.  1996;  Riddle  et  al.  1997;  Kim  et  al.  1998).  Additionally, 
the  physical  conditions  of  the  selection  step  can  be  easily 
controlled  and  adapted  to  specific  requirements.  For  in¬ 
stance,  phage-panning  experiments  are  compatible  with  the 
presence  of  reducing  agents,  denaturants,  and  proteases  and 
these  reagents  effectively  increase  the  selection  pressure  of 
the  screen  (Kristensen  1998;  Jung  et  al.  1999).  This  feature 


may  prove  useful  in  eliminating  false  positives  associated 
with  inserted  sequences  that  are  marginally  stable. 

We  are  currently  using  the  loop  entropy  reduction  selec¬ 
tion  described  in  this  paper  to  search  for  folded  proteins  in 
libraries  of  randomized  synthetic  sequences  and  libraries  of 
shuffled  genomic  sequences.  As  a  consequence  of  the  de¬ 
sign  of  the  screen,  it  is  very  likely  that  only  modules  with 
their  N  and  C  termini  in  close  proximity  will  be  selected. 
Although  this  is  certainly  a  limitation  of  the  selection,  it 
should  be  noted  that  the  N  and  C  termini  are  near  one 
another  in  a  disproportionately  large  number  of  globular 
proteins  (Thornton  1983).  It  is  attractive  to  speculate  that 
this  relatively  high  frequency  of  proximity  between  N  and  C 
termini  is  an  evolutionary  relic  of  a  mechanism  for  gener¬ 
ating  complex,  multidomain  proteins  from  smaller  folded 
units  similar  to  our  experimental  selection  strategy:  the  in¬ 
sertion  of  folded  modules  into  loops  of  other  folded  mod¬ 
ules.  Thus,  it  is  possible  that  the  selection  experiments  may 
to  some  extent  recapitulate  the  generation  of  the  complex 
multidomain  protein  structures  found  in  nature. 

Materials  and  methods 

Preparation  and  panning  of  phage 

All  phage  were  prepared  as  described  in  Gu  et  al.  (1995).  The 
preparation  of  IgG-coated  magnetic  beads  and  the  subsequent  pro¬ 
tein  L-panning  experiments  were  performed  as  described  in  Gu  et 
al.  (1995),  except  for  the  guanidine-panning  experiments  where  1 
M  guanidine  (USB,  Ultrapure)  was  present  in  both  the  binding  and 
washing  steps.  For  the  SH2-panning  experiments,  the  streptavidin- 
coated  magnetic  beads  (Dynabeads  M-280)  were  coated  with  a 
biotinylated  phosphotyrosine  peptide  (GGGGGGEPPQ[pY]EE 
IPIYL;  synthesized  by  Sigma  Genosys).  A  total  of  20  |xL  of  the 
streptavidin-coated  beads  (10  mg/mL)  were  incubated  with  0.2  p.g 
peptide,  0.5%  TWEEN-TBS  for  1  h,  washed  twice  with  800  p.L  of 
0.5%  TWEEN-TBS,  and  resuspended  in  20  piL  of  0.5%  TWEEN- 
TBS.  The  prepared  beads  (2  p-L)  were  incubated  with  5  x  10^ 
phage  particles  for  1  h  in  100  pL  of  a  4%  milk,  0.1%  TWEEN- 
TBS  solution.  The  beads  were  washed  7  times  with  800  pL  of 
0.5%  TWEEN-TBS  and  resuspended  in  a  final  volume  of  30  pL  of 
0.5%  TWEEN-TBS.  The  phage  bound  to  the  beads  were  trans¬ 
fected  into  XLI  Blue  cells  and  plated  onto  LB  agar  with  carbeni- 
cillin  to  quantitate  the  number  of  phage  bound  to  the  beads. 

Random  sequence  library 

A  random  sequence  library  was  made  by  self-ligation  of  a 
highly  degenerate  cassette:  GGATCC(VNNNNB)9GGATC  where 
N  =  A,C,T,G;  V  =  C,A,G;  B  =  G,T,C;  and  GGATCC  is  the 
BamRX  cleavage  site.  The  VNN,  NNB  polymer  codes  for  poly¬ 
peptides  containing  all  amino  acids  in  proportions  similar  to  the 
sequence  of  typical  soluble  proteins.  The  cassette  is  symmetrical, 
allowing  polymerization  in  both  orientations.  A  Gly-Ser  sequence 
corresponding  to  the  BamYil  sequence  occurs  every  20  amino  acids 
in  the  polymer.  As  only  one  stop  codon  occurs  among  96  possi¬ 
bilities,  a  significant  fraction  of  polymerized  cassettes  inserted  in 
the  host  protein  sequences  are  expected  to  be  full  length.  The 
cloning  of  the  random  sequences  into  the  host  protein  sequence 
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requires  the  proper  restriction  sites  and  flanking  sequences  at  each 
of  the  ends.  Two  adaptor  cassettes  (“Start”  and  “Stop”)  containing 
a  single  5amHI-cohesive  extremity  and  one  uncleaved  {Sail  and 
Kpnl,  respectively,  for  start  and  stop)  restriction  site  at  the  other 
end  were  introduced  in  a  low  molar  ratio  (1/8)  in  the  random- 
cassette  ligation  reaction.  Under  these  conditions,  a  ladder  of  prod¬ 
ucts  between  100  bp  and  1  kb  was  obtained.  For  ligation  into  the 
host  protein,  the  polymerized  cassettes  were  cleaved  with  Sail  and 
KpnL  Fragments  >200  bp  were  gel  purified  and  ligated  into  the 
vector.  The  constructs  were  then  electroporated  into  XLI  Blue  E. 
coli  cells  to  give  a  library  of  1.5  x  10^  independent  clones.  One 
round  of  phage-display  selection  followed  by  a  colony-lift  assay 
was  sufficient  to  identify  several  positive  clones.  Two  sequences 
containing  60  and  80  amino  acids  were  sequenced  and  both  dis¬ 
played  highly  polar  sequences.  One  of  these  sequences,  B1 1,  was 
used  in  subsequent  experiments. 

Equilibrium  denaturation 

The  methods  described  by  Gu  et  al.  (1995)  were  used  for  the 
overexpression  and  purification  of  the  chimeric  proteins.  Circular 
dichroism  equilibrium  denaturation  experiments  were  performed 
as  described  by  Scalley  et  al.  (1997). 
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Native  state  topology  has  been  implicated  as  a  major  determinant  of 
protein-folding  mechanisms.  Here,  we  test  experimentally  the  robustness 
of  the  src  SH3-domain  folding  transition  state  to  changes  in  topology  by 
covalently  constraining  regions  of  the  protein  with  disulfide  crosslinks 
and  then  performing  kinetic  analysis  on  point  mutations  in  the  context 
of  these  modified  proteins.  Circularization  (crosslinking  the  N  and  C 
termini)  of  the  src  SH3  domain  makes  the  protein  topologically  sym¬ 
metric  and  causes  delocalization  of  structure  in  the  transition  state 
ensemble  suggesting  a  change  in  the  folding  mechanism.  In  contrast, 
crosslinking  a  single  structural  element  (the  distal  |3-hairpin)  which  is  an 
essential  part  of  the  transition  state,  results  in  a  protein  that  folds  30 
times  faster,  but  does  not  change  the  distribution  of  structure  in  the  tran¬ 
sition  state.  As  the  transition  states  of  distantly  related  SH3  domains 
were  previously  found  to  be  very  similar,  we  conclude  that  the  free 
energy  landscape  of  this  protein  family  contains  deep  features  which  are 
relatively  insensitive  to  sequence  variations  but  can  be  altered  by  changes 
in  topology. 
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Keyzuords:  protein  folding;  folding  kinetics;  folding  mechanism;  transition 
state;  SH3  domain 


Introduction 

A  recent  development  in  the  protein  folding 
field  has  been  the  empirical  observation  that  native 
state  topology  is  a  major  determinant  of  folding 
rates,  with  simple  fold  proteins  folding  faster  than 
proteins  with  complicated  topologies.^  The  remark¬ 
able  correlation  found  between  the  average 
sequence  separation  of  interacting  residues  in  the 
native  structure  (contact  order)  and  the  rate  of 
folding  suggests  that  the  free  energy  barrier  to 
folding  has  a  large  entropic  contribution  while 
variations  in  the  strength  of  the  stabilizing  inter¬ 
actions  manifest  themselves  on  a  smaller  scale. 
Consistent  with  the  idea  that  the  molecular  details 
of  the  interactions  are  overshadowed  by  the  entro¬ 
pic  cost  of  making  them,  several  theoretical  models 
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have  been  successful  in  predicting  the  transition 
state  for  folding  and/or  the  folding  rate  for  small 
proteins  using  only  information  from  the  native 
state  structure.^"^  Furthermore,  recent  experimen¬ 
tal  studies  have  established  the  conservation  of 
folding  transition  states  among  homologous  pro¬ 
teins  with  the  same  topology  but  sequence  identity 
as  low  as  13%^"^  suggesting  that  once  the  native- 
state  topology  is  specified  by  the  sequence  the  fold¬ 
ing  transition  state  is  largely  determined  as  well. 
The  goal  of  our  study  is  to  explore  and  test  these 
conclusions  further. 

Previous  studies  of  the  folding  transition  state  of 
the  src  SH3  domain  showed  fliat  it  involves  the 
association  of  the  distal  p-hairpin  and  the  diver¬ 
ging  turn,  while  the  N  and  C  termini  are  comple¬ 
tely  disordered  (Figure  l(a)).®'^  Here,  we  explore 
the  robustness  of  this  transition  state  to  chain 
crosslinks  in  order  to  test  the  role  of  topology.  Pre¬ 
viously  Serrano  and  co-workers  showed  that  circu¬ 
larly  permuting  the  oc-spectrin  SH3  domain  can 
change  its  transition  state  depending  on  the  site  of 
permutation,^^  while  smaller  mutations  that  stabil¬ 
ize  a  part  of  the  folding  nucleus  do  not  alter  struc¬ 
ture  elsewhere  in  the  transition  state.^'^^  This 
argued  for  a  conformationally  restricted  transition 
state,  which  requires  the  interaction  of  specific 
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Figure  1.  Structure  in  the  transition  state  of  (a)  wt,  (b) 
NC  terminal  crosslink.  Color  scheme  is  continuous  from 
yellow  (4>f  =  1)  to  red  (4>f  =  0.5)  to  blue  (Op  =  0).  Aster¬ 
isks  indicate  negative  4>f  values,  which  suggest  the 
involvement  of  these  residues  in  non-native  interactions 


in  the  transition  state.  Graphics  were  generated  with 
MolscripP®  and  RasterSd.^^'^*^ 


parts  of  the  molecule  to  overcome  the  loss  of  entro¬ 
py  during  folding.  Here,  we  examine  the  effects  on 
the  transition  state  of  disulfide  crosslinking  the 
distal  3’h^5Jrpin  and  circularizing  (linking  the  N 
and  C  termini)  the  protein.  These  modifications 
were  previously  characterized  kinetically  in  their 
reduced  and  oxidized  forms  to  test  backbone  con¬ 
formational  ordering  in  the  transition  state.^^ 
Crosslinking  the  distal  hairpin  increased  the  fold¬ 
ing  rate  30-fold  without  affecting  the  unfolding 
rate,  suggesting  that  this  structural  element  is  as 
conformationaliy  constrained  in  the  transition  state 
as  in  the  native  state.  Crosslinking  the  N  and  the  C 
termini  stabilized  the  protein  significantly  both  by 
increasing  the  folding  rate  and  by  decreasing  the 
unfolding  rate,  indicating  that  the  two  termini  are 


not  fully  interacting  in  the  transition  state.  Here, 
we  perform  mutational  analysis  to  determine  if 
these  modifications  affect  the  distribution  of  struc¬ 
ture  in  the  transition  state.  By  deleting  parts  of 
individual  residues  (as  in  mutations  to  an  alanine 
residue)  and  then  assessing  the  effect  of  the 
mutation  on  stability  and  folding  kinetics,  we  can 
gain  site-specific  information  about  structure  at  the 
rate-limiting  step.^^  The  degree  of  structure  for¬ 
mation  around  each  residue  in  the  transition  state 
can  be  conveniently  represented  by  values, 
defined  as  AAGu-^/AAGu-f  (see  Materials  and 
Methods).  In  the  case  of  the  distal  hairpin  cross¬ 
link,  we  investigate  whether  it  allows  overcoming 
of  the  entropic  cost  of  ordering  earlier  in  the  fold¬ 
ing  reaction  and  thus  makes  other  parts  of  the 
transition  state  less  structured.  In  the  case  of  the 
terminal  crosslink,  we  examine  if  significantly 
decreasing  the  entropic  barrier  to  folding  and  mak¬ 
ing  the  protein  topologically  symmetric  causes  its 
transition  state  to  become  delocalized  with  all  resi¬ 
dues  contributing  equally,  or  whether  it  remains 
structurally  polarized.  A  similar  circularization 
experiment  was  performed  on  chymottypsin 
inhibitor  2  (02)/"^  however,  circularization  did  not 
affect  its  transition  state  probably  because  it  is  lar¬ 
gely  delocalized  even  in  the  wild-type  protein. 
Our  results  suggest  that,  at  least  for  the  src  SH3 
domain,  the  transition  state  ensemble  can  be 
shifted  when  the  native  topology  is  significantly 
perturbed  as  in  circularization,  but  not  by  stabiliz¬ 
ation  of  the  existing  nucleus. 

Results 

Disulfide  crosslinking  of  the  distal  p-hairpin 

Covalent  crosslinking  of  the  distal  p-hairpin  is 
expected  to  decrease  the  entropy  of  the  denatured 
state  and  stabilize  intrahairpin  interactions.  If  both 
chain  entropy  and  energy  are  smoothly  varying 
functions  of  the  degree  of  ordering  and  the  pos¬ 
ition  of  the  transition  state  is  determined  by  their 
imperfect  cancellation,  then  such  a  change  should 
alter  the  position  of  the  transition  state.  Consistent 
with  the  Hammond  postulate,^^  destabilizing  the 
denatured  state  would  shift  the  position  of  the 
transition  state  closer  to  the  denatured  state  and 
result  in  lower  Op  values  in  regions  other  than  the 
distal  hairpin.  In  contrast,  if  the  energy  decreases 
abruptly  when  a  large  number  of  contacts  form 
simultaneously,  the  transition  state  would  be  less 
sensitive  to  changes  in  interaction  strengths  and  it 
would  be  effectively  "'locked".  In  that  case,  we 
would  expect  that  crosslinking  the  distal  hairpin 
will  increase  the  folding  rate,  but  structure  in  the 
transition  state  will  remain  the  same.  The  distal 
hairpin  was  previously  crosslinked  by  mutating 
residues  W43  and  S58  to  cysteine  residues  and 
forming  a  disulfide  bridge  between  them  under 
oxidizing  conditions. Here,  we  perform  O  value 
analysis  on  several  mutants  throughout  the  cross- 
linked  protein  (denoted  SS)  to  determine  whether 
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SS  mutant  WT  SH3 


Figure  2.  Kinetic  analysis  of  mutants  in  the  context  of  the  distal  hairpin  SS  crosslink.  Rates  of  folding  and  unfolding 
were  measured  using  stopped  flow  fluorescence  at  295  K.  Continuous  lines  represent  the  best  fit  to  the  experimental 
data  (Kaleidagraph). 


structure  in  the  transition  state  has  changed  (^lOA 
and  A12G  (N- terminal  strand),  D15A  and  L24A 
(RT  loop),  G29A  and  E30A  (diverging  turn),  I34A 
(n-src  loop)  and  G51A  and  156 A  (distal  P-hairpin)). 
The  relative  effect  of  the  mutations  on  the  rate  of 
folding  and  unfolding  is  very  similar  in  the  context 
of  the  crosslinked  protein  and  in  the  wild-type 
(Figure  2  and  Table  1)  resulting  in  similar 
values.  Even  though  the  distal  hairpin  is  covalently 
crosslinked  at  its  base,  G51A  retains  a  Op  value 
close  to  1,  suggesting  that  this  residue  plays  an 
important  role  in  organizing  structure  at  the  turn 
(the  glycine  may  allow  the  sidechain  oxygens  of 
S47  and  T50  to  hydrogen  bond  with  neighboring 
backbone  amides,  positioning  them  to  interact  with 
the  diverging  turn).  The  only  mutations  that  exhi¬ 


bit  different  behavior  in  the  wt  and  crosslinked 
protein  are  L24A  and  I34A,  but  these  effects  can  be 
attributed  to  changes  in  local  structure  around  the 
disulfide  crosslink.  In  the  wild-type  (wt)  protein, 
I34A  destabilizes  the  transition  state  more  than  the 
native  state,  suggesting  non-native  structure  in  the 
transition  state,  however,  in  the  crosslinked  mutant 
the  I34A  mutation  has  an  intermediate  (j>p  value, 
perhaps  because  the  replacement  of  the  large  W43 
sidechain  by  cysteine  destabilizes  non-native  struc¬ 
ture  in  this  region.  L24A,  on  the  other  hand,  shows 
an  increase  in  Op  value  in  the  crosslinked  mutant 
from  0.26  to  0.42  indicating  that  the  diverging  turn 
is  slightly  better  packed  onto  the  distal  hairpin  in 
the  transition  state.  Overall,  however,  the  transition 
state  of  the  crosslinked  protein  involves  the  same 


Table  1.  Kinetic  parameters  for  the  SS  mutants 


Mutant 

mi 

AAGu 

<Df 

cDVVTb 

SS  WT* 

5.83 

2,83 

0.716 

0.600 

_ 

- 

SS^FIOA 

5.50 

4.98 

0.723 

0.519 

-1.45 

0.13 

0.10 

SS  D15A 

5.70 

3.90 

0.666 

0.358 

-0.721 

0.13 

-0.22 

SS'L24A 

5.12 

3.80 

0.803 

0.404 

-0.985 

0.42 

0.26 

SS  G29A 

4.42 

5.82 

0.630 

0.638 

-2.58 

0.32 

0.44 

SS  E30A 

4.10 

4.35 

0.581 

0.482 

-1.91 

0.53 

0.62 

SS  G51A 

3.64 

3.19 

0.911 

0.332 

-1.50 

0.86 

1.06 

SSl56A 

3.36 

3.71 

1.06 

0.276 

-1.96 

0.74 

0.71 

kf  is  reported  in  1  M  guanidine,  while  k^^  is  in  6  M  guanidine  to  avoid  extrapolation;  mf  and  are  the  dependences  of  the  folding 
and  the  unfolding  rates,  respectively,  on  Gnd.  Typical  errors  for  the  kinetic  measurements  are  2-20%  as  reported  by  Riddle  ct  nl.^ 
Kinetic  data  for  this  mutant  was  published  previously  by  Grantcharova  et  al.^^ 

^  Op  values  taken  from  the  paper  by  Riddle  ct  al.^ 
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structural  elements  as  that  of  the  wild-type  SH3 
domain.  We  can  conclude  that  even  though  for¬ 
mation  of  the  distal  p-hairpin  is  required  for  the 
overcoming  of  the  activation  barrier,  it  is  not  suffi¬ 
cient,  even  when  it  is  largely  stabilized.  The  rate- 
limiting  step  involves  bringing  the  distal  p-hairpin 
and  the  diverging  turn  together  to  form  a  three- 
stranded  P-sheet.  Thus,  stabilization  of  this  element 
speeds  folding,  but  does  not  alter  the  transition 
state  ensemble. 

Disulfide  crosslinking  of  the  N  and  C  termini 

Theoretical  models  of  the  transition  state  for 
folding  emphasize  the  balance  between  loss  of  con¬ 
figurational  entropy  and  formation  of  stabilizing 
interactions  in  determining  which  part  of  the  mol¬ 
ecule  folds  first.^^"^"^  The  combination  of  structural 
elements  in  the  protein  that  can  bury  the  most  sur¬ 
face  area  while  losing  the  least  amount  of  config¬ 
urational  entropy  may  nucleate  folding.  In  the 
modeled  free  energy  landscape  for  the  src  SH3 
domain-  there  is  only  one  set  of  segments  (the  dis¬ 
tal  hairpin  and  the  diverging  turn)  which  can 
associate  with  sufficient  number  of  favorable  con¬ 
tacts  to  compensate  for  the  loss  in  entropy;  all 
other  pairings  are  entropically  too  costly  and 
poorly  populated  to  lead  to  productive  folding.  In 
particular,  the  two  terminal  strands,  which  form  a 
sheet  in  the  native  state,  were  found  fo  be  comple¬ 
tely  unstructured  in  the  transition  state  due  to  their 
large  sequence  separation.  Our  strategy  here  is  to 
connect  the  termini  and  examine  how  the  distri¬ 
bution  of  structure  in  the  transition  state  changes. 
A  circularized  version  of  the  src  SH3  domain 
(denoted  NC  protein)  was  previously  constructed 
by  mutating  both  residues  T9  and  S64  to  cysteine 


residues  and  forming  a  disulfide  bridge  between 
them  under  oxidizing  conditions. Crosslinking 
makes  the  topology  of  the  protein  symmetric  and 
entropically  there  is  no  reason  why  one  three- 
stranded  sheet  would  form  first  over  the  other. 
One  prediction  is  that  circularization  would  greatly 
reduce  structural  polarization  because  it  will  offer 
alternative  routes  for  folding.  Another  possibility  is 
that  the  same  folding  nucleus  will  be  maintained 
because  the  interactions  present  in  it  are  inherently 
more  favorable.  Such  a  breakdown  in  symmetry  is 
seen  in  protein  L  which  is  topologically  symmetric 
and  yet  one  part  of  the  molecule  is  preferentially 
structured  at  the  transition  state.^°'^^  Distinguishing 
between  these  two  possibilities  addresses  the  rela¬ 
tive  importance  of  variations  in  interaction  energies 
and  chain  entropy  in  determining  the  folding  tran¬ 
sition  state. 

In  order  to  determine  the  effect  of  circularization 
on  the  transition  state  we  examined  the  effect  of 
mutants  in  the  context  of  the  crosslinked  protein. 
A  total  of  14  mutants  were  designed  to  probe 
different  regions  of  the  transition  state:  A12G, 
L13A,  Y16A  and  D23A  (first  strand)  and  RT  loop; 
F26A  and  G29A  (diverging  turn);  I34A  and  W43A 
(n-src  loop);  A45G,  S47A  and  G51A  (distal  hair¬ 
pin);  I56A  and  P57A  (3^0  helix);  V61A  (C-terminal 
strand).  Kinetic  analysis  (Figure  3  and  Table  2) 
reveals  that  there  are  clear  differences  between 
some  of  the  mutants  in  the  NC  protein  and  the  cor¬ 
responding  mutations  in  the  wt.^  The  most  signifi¬ 
cant  changes  are  observed  in  the  Op  values  of 
residues  in  the  distal  hairpin.  A45G,  S47A  and 
G51A  affect  both  the  rates  of  folding  and  unfolding 
in  the  NC  mutant,  while  in  the  context  of  the  wt 
protein  their  Op  values  were  all  1  (Figure  3(a)  and 
(b)).  Since  we  are  not  certain  of  the  homogeneity  of 


NC  mutant 


(GdmCa 


WTSH3 


Figure  3.  Kinetic  analysis  of 
mutants  in  the  context  of  the  NC 
terminal  crosslink.  Rates  of  folding 
and  unfolding  were  measured 
using  stopped  flow  fluorescence  at 
295  K.  Continuous  lines  represent 
the  best  fit  to  the  experimental  data 
(Kaleidagraph). 
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Table  2.  Kinetic  parameters  for  the  NC  crosslink  mutants 


Mutant 


,6  M 


mf 


AAGu 


NC_WT’ 

NC_A12G 

NC_L13A 

NC  Y16A 

NC'D23A 

NC_F26A 

NC_G29A 

NC  I34A 

NClW43A 

NC_A45G 

NC_S47A 

NC_G51A 

NCJ56A 

NC^P57A 

NC  V61A 


4.73 

4.41 

4.00 

4.01 

4.64 

3.26 

3.46 

3.66 
3.97 
4,50 
3.95 
3.99 

2.67 
3.78 
3.84 


-0.597 

2.76 

2.58 

3.28 

0.305 

1.13 

1.78 

-1.22 

1.75 

0.436 

0.400 

0.551 

-0.353 

1.95 

1.66 


0.766 

0.874 

0.740 

0.752 

0.760 

0.812 

0.780 

1.082 

0.830 

1.00 

0.990 

0.994 

1.10 

0.770 

0,821 


0.800 

0.637 

0.340 

0.431 

0.500 

0.370 

0.500 

0.216 

0.337 

0.322 

0.309 

0.437 

0.190 

0.550 

0.315 


-2.16 

-2.29 

-2.69 

-0.580 

-1.87 

-2.14 

-0.263 

-1.82 

-0.737 

-1.04 

-1.11 

-1.35 

-2.05 

-1.84 


0.09 

0.19 

0.16 

0.09 

0.46 

0.35 

c 

0.25 

0.18 

0.44 

0.39 

0.89 

0.27 

0.28 


0.05 

-0.03 

0.03 

0.13 

0.40 

0.44 

c 

0.15 

1.20 

0.95 

1.06 

0.71 

0.24 

-0.06 


k~is  reported  in  1  M  guanidine,  while  is  in  6  M  guanidine  to  avoid  extrapolation;  m,  and  ^ 

and  the  .^folding  rates  respectively,  on  Gnd.  Typical  errors  for  the  luetic  measurements  are  2-20  /o  as  reported  by  Riddle  . 

*  Kinetic  data  for  this  mutant  was  published  by  Grantcharova  et  at 
^  cl>p  values  taken  from  the  paper  by  Riddle  et  al.^ 

^  Mutation  decreases  both  kf  and  - - 


the  transition  state  the  intermediate  Op  values  can 
either  mean  that  the  interactions  in  which  these 
residues  participate  are  not  completely  formed  in 
the  transition  state,  or  that  the  transition  state 
ensemble  consists  of  some  conformations  in  which 
the  distal  hairpin  is  formed  and  some  in  which  the 
hairpin  is  disordered.  On  the  other  hand, 
mutations  which  probe  formation  of  the  hairpin 
newly  created  by  the  crosslink  and  the  region 
around  the  3io  helix  have  increased  Op  values, 
suggesting  that  these  residues  now  contribute  to 
stabilization  of  the  transition  state.  156,  an  integral 
part  of  the  hydrophobic  core  exhibits  an  increase 
in  Op  value  from  0.7  to  0.89  (Figure  3(c)  and  (d)). 
In  the  n-src  loop,  mutation  of  the  buried  W43  to 
alanine  had  no  effect  on  the  rate  of  folding  in  the 
wt  context,  but  in  the  NC  protein  it  has  a  Op  value 
of  0.25.  In  a  similar  way,  V61  (C-terminal  strand), 
which  takes  part  in  the  hydrophobic  core,  and  L13 
(N-terminal  strand),  which  interacts  with  the 
C-terminal  strand  on  the  solvent  exposed  side, 
both  have  increased  Op  values  upon  mutation 
from  0  to  0.28  and  0.19,  respectively.  Other 
mutations  in  the  first  strand  and  the  RT  loop 
(A12G  and  D23A)  exhibit  roughly  the  same  Op 
values  in  the  NC  mutant  as  in  the  WT  protein  (Op 
values  close  to  0),  suggesting  that  despite  the  cross¬ 


link  this  region  remains  unstructured  in  the  tran¬ 
sition  state.  F26A  and  G29A  in  the  diverging  turn 
also  preserve  their  intermediate  Op  values  in  the 
NC  mutant. 

Taken  together,  these  data  suggest  that  the  tran¬ 
sition  state  of  the  circularized  protein  is  signifi¬ 
cantly  different  from  that  of  the  wt  protein 
(Figure  1(a)  and  (b);  Figure  4(b)).  The  transition 
state  of  the  WT  SH3  domain  is  highly  polanzed 
with  the  distal  hairpin  and  the  diverging  turn 
almost  fully  ordered  in  a  three-stranded  sheet,  and 
the  termini  disordered.  In  contrast,  the  circularized 
protein  appears  to  have  a  more  delocalized  tran¬ 
sition  state  with  a  prevalence  of  intermediate  Op 
values  in  most  of  the  structural  elements  probed. 
The  distal  hairpin,  however,  is  still  more  ordered 
than  other  hairpins  in  the  protein,  probably 
because  it  has  the  highest  density  of  intrahairpin 
interactions.  It  is  interesting  that  one  residue,  156 
(central  hydrophobic  core  residue),  stands  out  with 
a  Op  value  close  to  1  and  therefore  can  be  viewed 
as  the  nucleus  around  which  structure  consoli¬ 
dates.  We  can  surmise  that  because  of  the  circular 
topology  of  the  protein,  hairpin  formation  is  not  as 
important  in  the  NC  protein  as  it  is  in  the  wt. 
Instead,  it  appears  that  hydrophobic  collapse, 
rather  than  local  p-hairpin  and  sheet  formation. 


Table  3.  Kinetic  parameters  for  WT  and  mutants  in  0.4  M  sodium  sulfate 
Mutant 


AAGu 


WT  sulf 

4.31 

0.867 

0.695 

0.805 

F10I_sulf 

4.22 

3.65 

0.702 

0.580 

L44A  sulf 

2.87 

2.66 

1.24 

0.474 

G51A  sulf 

2.33 

1.12 

0.959 

0.519 

I56A  sulf 

2,06 

1.70 

1.14 

0.468 

-1.68 

-1.89 

-1.31 

-1.81 


0.03 

0.45 

0.89 

0.73 


-0.05 

0.54 

1.06 

0.71 


L  is  reported  in  0.5  M  guanidine,  while  /c,  is  in  5  M  guaniame  ru  avu  u  hv  Riddle  et  al « 

ing  and  the  unfolding  rates,  respectively,  on  Gnd.  Typical  errors  for  the  kinetic  measuremen  s  are  o  p 
*  4)p  values  taken  from  the  work  by  Riddle  et  ni^  _  _ _ _ _ _ 
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Figure  4.  Comparison  of  4>f  values  for  mufated  resi¬ 
dues  in  the  wt  and  the  (a)  SS  crosslink  an'd  (b)  NC 
crosslink.  Asterisk  indicates  a  negative  4>f  value,  which 
suggests  the  involvement  of  this  residue  in  non-native 
interactions  in  the  transition  state. 


drives  the  early  stages  of  folding  of  the  circularized 
protein,  due  to  the  decreased  cost  of  bringing 
together  residues  distant  in  the  chain. 


Global  stabilization  and  structure  in  the 
transition  state 

Probing  the  effect  of  a  globally  stabilizing  agent 
on  the  rate-limiting  step  in  folding  provides 
another  way  to  examine  the  robustness  of  the  tran¬ 
sition  state.  Sodium  sulfate  stabilizes  proteins,  pre¬ 
sumably  by  its  preferential  hydration  of  water, 
therefore  facilitating  hydrophobic  collapse.^^  Its 
effect  on  the  kinetics  of  the  src  SH3  domain  is  to 
increase  the  folding  rate  and  to  decrease  the 
unfolding  rate  (Table  3),  indicating  that  protein 
desolvation  occurs  both  before  and  after  the  tran¬ 
sition  state.  It  also  decreases  the  denaturant  depen¬ 
dence  of  the  folding  rate  (i.e.  nif  value),  suggesting 
that  it  makes  the  denatured  state  more  compact. 
We  performed  kinetic  analysis  in  the  presence  of 
0.4  M  sodium  sulfate  of  several  mutants,  in  the  wt 
context,  which  in  the  absence  of  sodium  sulfate 
cover  the  full  range  of  <I>f  values:  FIOI  (^f  value  of 
0);  G51A  (<Df  value  of  1);  L44A  and  I56A  (inter¬ 
mediate  <t>f:  values).  All  mutants  conserve  their  cl> 


values  in  the  presence  of  sodium  sulfate  (Table  3), 
suggesting  that  the  transition  state  ensemble  has 
not  been  changed  by  addition  of  the  salt.  This  is  a 
further  confirmation  that  the  transition  state  for 
folding  is  determined  by  deep  features  of  the  SH3 
energy  landscape.  Similar  experiments  with  the 
a-spectrin  SH3  domain  show  that  variations  in  pH 
do  not  affect  transition  state  structure.^ 


Discussion 

Changing  the  structure  of  transition 
state  ensembles 

Point  mutagenesis,  which  probes  residue-specific 
interactions,  and  covalent  modifications  (glycme 
loop  insertions  and  disulfide  crosslinking),  which 
test  long-range  order  in  the  transition  state 
revealed  the  conformationally  restricted  nature  of 
the  transition  state  ensemble  of  the  src  SH3 
domain.®'^^  Here,  we  have  explored  in  more  detail 
the  free-energy  landscape  of  this  protein  and  have 
determined  how  changes  in  chain  configurational 
entropy  and  interaction  strengths  affect  the  distri¬ 
bution  of  structure  in  the  transition  state. 

Covalent  crosslinking  is  a  convenient  way  of 
altering  the  entropic  cost  of  contact  formation.  It 
reduces  the  average  sequence  separation  between 
interacting  residues  (i.e.  contact  order  (CO))  and  is 
therefore  expected  to  increase  the  rate  of  folding, 
as  observed  in  the  very  good  correlation  between 
CO  and  rate  of  folding  for  all  the  characterized 
two  state  folding  proteins.^^  Both  the  NC  and  SS- 
crosslinked  mutants  fall  within  the  spread 
observed  for  natural  proteins  on  the  CO  versus 
log{k)  plot  (data  not  shown).  It  should  be  noted, 
however,  that  the  distal  hairpin  crosslink  causes  a 
larger  increase  in  folding  rate  than  the  NC-terminal 
crosslink  even  though  the  contact  order  of  the  NC 
protein  is  smaller.  This  is  because  the  folding  rate 
is  sensitive  to  chain  entropy  loss  in  the  transition 
state  rather  than  the  native  state.  The  distal  loop 
crosslink  reduces  the  entropy  of  the  denatured 
state  dramatically  but  has  essentially  no  effect  on 
that  of  the  transition  state  (the  loop  is  already 
formed),  while  the  NC  crosslink  reduces  the  entro¬ 
py  of  both  the  denatured  state  and  the  transition 
state.  The  overall  correlation  between  contact  order 
and  folding  rate  suggests  that  on  average  the  dis¬ 
tribution  of  contacts  (contact  order)  in  the  folding 
transition  state  follows  that  of  the  native  structure, 
but  in  any  specific  case,  the  effect  of  a  crosslink  on 
the  folding  rate  will  depend  on  the  transition  state 
structure  and  not  solely  on  the  reduction  in  the 
native  state  contact  order. 

The  finding  that  stabilization  of  local  structure 
by  distal  hairpin  crosslinking  (Table  1,  Figure  4(a)) 
and  global  stabilization  by  sodium  sulfate  (Table  3) 
do  not  alter  the  placement  of  the  transition  state 
along  the  reaction  coordinate  (as  judged  by  Op 
value  distributions)  indicates  that  there  are  some 
deep  features  in  the  energy  landscape  which  are 
not  altered  by  such  changes.  These  results  are  con- 
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sistent  with  experiments  on  other  SH3  domains. 
The  distantly  related  src  and  oc-spectrin  SH3 
domains  exhibit  very  similar  transition  states/'® 
and  stabilizing  mutations^^  and  changes  in  pH^  do 
not  seem  to  affect  transition  state  structure  of  the 
spectrin  SH3  domain.  It  appears  then,  that  SH3 
domains  allow  quite  large  variations  in  sequence 
and  experimental  conditions  with  no  change  to  the 
transition  state  probably  because  there  are  no 
alternative  structural  elements  that  can  be  suffi¬ 
ciently  stabilized  to  become  folding  nuclei.  On  the 
other  hand,  modifying  the  topology  of  the  protein, 
as  in  circularization,  can  significantly  change  the 
free  energy  landscape  to  favor  alternative  routes 
for  folding.  Similar  conclusions  were  drawn  from 
the  circular  permutation  experiments  on  the 
a-spectrin  SH3  domain. Connecting  the  wt  termi¬ 
ni  with  a  small  peptide  linker  and  introducing  a 
cut  in  the  distal  hairpin  resulted  in  a  shift  in  the 
structure  of  the  transition  state  towards  the  n-src 
loop  and  the  hairpin  formed  by  the  old  termini; 
the  former  distal  hairpin  was  completely  disor¬ 
dered  at  the  rate-limiting  step.  (In  contrast,  circular 
permutations  that  did  not  involve  the  distal  loop 
P-hairpin  did  not  appear  to  change  the  folding 
transition  state.)  Therefore,  shifts  in  transition  state 
structure  can  occur  when  formerly  distant  elements 
are  covalently  linked  to  reduce  the  entropic  cost  of 
their  interaction. 

It  should  be  noted  that  SH3  domains  have  more 
polarized  folding  transition  states  than  other  small 
proteins  (CI2,  ACBP,  AcP,  FKBP12).  Therefore, 
changes  in  the  structure  of  the  folding  transition 
state  are  more  evident  for  the  SH3  domains  than 
for  proteins  with  more  delocalized  folding  tran¬ 
sition  state  ensembles.  A  particularly  well  studied 
example  of  a  protein  with  a  more  delocalized  fold¬ 
ing  transition  state  is  chymotrypsin  inhibitor  2 
(CI2),  only  one  residue  has  a  Op  value  greater  than 
0.5.  Drastic  changes  in  the  topology  of  CI2  through 
circular  permutation  or  circularization^'^  have  rela¬ 
tively  little  effect  on  the  folding  transition  state. 

While  topology  plays  an  important  role  in  deter¬ 
mining  a  protein's  folding  mechanism,  the  distri¬ 
bution  of  interaction  energies  throughout  the 
protein  also  affects  structure  in  the  transition  state. 
Recent  experiments  demonstrate  that  the  transition 
state  conservation  observed  for  sequence  homologs 
of  the  SH3  domain  does  not  hold  for  structural 
homologs.  For  example,  drastic  mutagenesis, 
which  weakens  the  interaction  energies  throughout 
the  protein  can  make  the  transition  state  deloca¬ 
lized.  A  sequence  simplified  mutant  of  the  src  SH3 
domain  made  predominantly  of  five  amino  acid 
residues  (I,  K,  E,  A,  G)  was  found  to  have  a  more 
delocalized  transition  state  (distal  hairpin  is  not 
fully  formed),  most  likely  because  the  interactions 
stabilizing  the  wt  SH3  transition  state  are  not 
strong  enough  in  the  simplified  mutant  to  over¬ 
come  the  loss  in  entropy  and  residues  from  other 
parts  of  the  protein  have  to  participate  (Q.  Yi  and 
D.B.,  unpublished  results).  Furthermore,  the  pre¬ 
sence  of  destabilizing  features  in  a  particular  struc¬ 


tural  element  might  be  required’  for  functional 
reasons.  This  results  in  a  preferential  switch  in  the 
structured  parts  of  the  transition  state  to  other 
regions  with  more  favorable  interactions.  PsaE,  a 
structural  homolog  of  the  SH3  domain,  has  a  large 
loop  insertion  at  the  distal  hairpin  (13  residues) 
required  for  its  function  in  the  photosynthetic  cen¬ 
ter  I  of  cyanobacteria.^"^  The  larger  entropic  cost  of 
forming  stabilizing  interactions  makes  the  tran¬ 
sition  state  delocalized  with  high  values  distrib¬ 
uted  throughout  the  protein  (P.  Bowers  and  D.B., 
unpublished  results).  Sso7d,  a  DNA  binding  pro¬ 
tein  from  Sulfolobus  solfataricus,^  is  another  structur¬ 
al  homolog  of  the  SH3  domain.  Its  distal  hairpin 
contains  three  glycine  residues  at  the  turn  and  two 
more  in  the  p-strands  required  for  function,  and  is 
not  well  ordered  early  in  folding  (R.  Guerois  and 
L.  Serrano,  personal  communication).  The  n-src 
hairpin,  on  the  other  hand,  is  the  most  regular 
element  of  structure  with  a  favorable  hydro¬ 
phobic/hydrophilic  pattern  along  the  strands  and 
a  canonical  type  I  turn.  The  burial  of  hydrophobic 
surface  area  between  the  n-src  loop  and  the  C- 
terminal  helix  further  favors  these  elements  as  a 
folding  nucleus.  Therefore,  the  topology  of  the  SH3 
fold  appears  to  allow  several  alternative  routes  of 
folding. 

Another  example  of  a  simple  system  in  which 
the  effects  of  topology  and  local  structural  propen¬ 
sity  on  the  transition  state  have  been  examined  is 
the  GCN4-pl  coiled  coil.  Mutational  analysis  indi¬ 
cated  that  the  folding  of  the  dimeric  coiled  coil 
occurs  via  multiple  pathways.^  Variations  in  heli¬ 
cal  propensity  along  the  helix  can  favor  one  path¬ 
way  over  the  others  (e.g.  the  C  terminus  of  the 
GCN4  coiled  coil  has  a  higher  helical  propensity 
and  has  been  shown  to  form  early  in  folding).  ® 
Destabilizing  one  part  of  the  helix  (with  A  to  G 
mutations)  channels  folding  to  the  alternative  path¬ 
ways.  However,  crosslinking  the  two  helices  to 
form  a  monomer  abolishes  the  symmetry,  making 
it  entropically  more  favorable  for  folding  to  start  at 
the  part  of  the  helix  proximal  to  the  tether,  even  if 
it  has  the  lowest  helical  propensity.^  In  the  mono¬ 
meric  version  of  the  coiled  coil,  the  topological 
constraints  on  the  chain  effectively  limit  the  num¬ 
ber  of  folding  pathways  to  one  and  make  the 
transition  state  less  sensitive  to  variations  in 
secondary  structure. 

A  similar  dependence  of  the  folding  mechanism 
on  the  stability  of  individual  structural  elements  is 
observed  in  two  proteins  with  symmetrical  top¬ 
ology:  protein  L  and  protein  G  (an  a-helix  packed 
against  two  P-hairpins  forming  a  sheet).  In  the 
transition  state  of  protein  L  the  first  hairpin  packs 
against  the  a-helix,  while  in  protein  G  the  second 
hairpin  is  more  structured.^®'^^  The  choice  of  hair¬ 
pin  appears  to  depend  on  the  intrinsic  stability  of 
the  hairpins.  In  protein  L,  the  first  hairpin  has 
more  favorable  side-chain:main-chain  hydrogen 
bonds,  while  the  second  hairpin  contains  three  con¬ 
secutive  residues  with  positive  (j)  angles.  In  protein 
G,  on  the  other  hand,  the  second  hairpin  has  an 
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extensive  hydrogen  bond  network.  Using  compu¬ 
tational  protein  design  methods,  the  order  of 
events  in  the  folding  of  protein  L  and  protein  G 
can  be  switched  by  selectively  stabilizing  the  hair¬ 
pin  normally  formed  late  in  folding  (S.  Nauli,  B. 
Kuhlman  &  D.B.,  impublished  results). 

The  ability  to  change  the  transition  state  for  fold¬ 
ing  tests  our  understanding  of  the  factors  contri¬ 
buting  to  its  formation  and  specificity.  Our  results 
with  the  circularized  src  SH3  domain  and  the 
experimental  studies  on  other  proteins  highlight 
the  interplay  of  topologic  constraints  and  contact 
energy  heterogeneity  in  determining  the  structure 
of  the  transition  state  ensemble. 

Materials  and  Methods 

Mutagenesis 

Point  mutagenesis  was  accomplished  using  the  Quick 
Change  Site-Directed  mutagenesis  kit  (Stratagene,  La 
Jolla,  CA).  Plasmids  harboring  the  mutations  were  trans¬ 
formed  into  BL21  cells,  and  protein  was  overexpressed 
and  purified.^  The  His -Tag"  was  not  removed  for  the 
purpose  of  this  study.  All  mutants  were  sequenced  to 
ensure  that  the  mutagenesis  was  successful  and  the  puri¬ 
fied  proteins  were  analyzed  by  mass  spectrometry  to 
confirm  that  each  mutation  was  the  expected  one. 

Disulfide  crossiinking 

The  design  of  the  SS  and  NC  crosslink"  mutants  was 
described  by  Grantcharova  et  For  all  the  mutants 
disulfide  bonds  were  oxidized  in  the  presence  of  20  mM 
potassium  ferricyanide  K3Fe(CN)^  for  ten  minutes  at 
room  temperature.  Reactions  were  performed  in  the 
dark  because  K^FefCN)^  is  light  sensitive.  Disulfide  for¬ 
mation  was  confirmed  using  Ellman's  reagent. 

Biophysical  analysis 

Protein  solutions  (100  pM)  were  made  in  50  mM 
sodium  phosphate  (pH  6).  For  the  experiments  in 
sodium  sulfate,  0.4  M  sulfate  was  added  to  the  solutions. 
The  kinetics  of  folding  and  unfolding  were  followed  by 
tryptophan  fluorescence  on  a  Bio-Logic  SFM-4  stopped- 
flow  instrument  at  295  K.  The  unfolding  reaction  for  the 
wt  protein  was  previously  determined  to  behave  as  a 
two-state  process,-^  and  the  kinetic  and  equilibrium  data 
for  the  mutants  were  fit  to  a  two-state  model.  For  each 
mutant  the  free  energy  of  folding  is  calculated  as: 

AGu-f  =  RT\n{kf/k,,) 

where  k^  and  k^  are  the  rates  of  folding  and  unfolding, 
respectively,  in  the  absence  of  denaturant.  The  difference 
in  the  free  energy  of  folding  (AAGy  _  f)  and  in  the  fold¬ 
ing  activation  energy  (AAGy  _  *)  between  the  wt  protein 
and  each  mutant  are  calculated  as: 

AAGu-f  =  RT(ln(t"'Vitr‘)  +  In(C'VC)) 

and 

AAGu-t  =  l^TIn(Af 

where  kf  and  k^^  are  the  rates  of  folding  and  unfolding, 
respectively,  at  denaturant  concentrations  experimentally 


accessible  for  that  mutant.  This  method  avoids  the  extra¬ 
polation  of  kf  and  k^  to  OM  denaturant  and  therefore 
does  not  rely  on  the  accurate  determination  of  the  ntf 
and  m^^  values  (the  denaturant  dependence  of  kf  and 
respectively). 

The  parameter  is  defined  as: 

4>f  ^  A AGu-+/ AAG(j-f 

and  is  interpreted  as  the  fraction  of  the  mutated  residue's 
interactions  that  are  formed  in  the'  transition  state.  A 
value  of  1  indicates  that  all  of  a  residue's  interactions  are 
formed  in  the  transition  state,  whereas  a  of  0  means 
that  the  residue  does  not  make  stabilizing  interactions  in 
the  transition  state.^'^ 
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The  strong  correlation  between  protein  folding  rates  and  the 
contact  order  suggests  that  folding  rates  are  largely  determined 
by  the  topology  of  the  native  structure.  However,  for  a  given 
topology,  there  may  be  several  possible  low  free  energy  paths  to 
the  native  state  and  the  path  that  is  chosen  (the  lowest  free 
energy  path)  may  depend  on  differences  in  interaction  energies 
and  local  free  energies  of  ordering  in  different  parts  of  the 
structure.  For  larger  proteins  whose  folding  is  assisted  by 
chaperones,  such  as  the  Escherichia  coli  chaperonin  GroEL, 
advances  have  been  made  in  understanding  both  the  aspects  of 
an  unfolded  protein  that  GroEL  recognizes  and  the  mode  of 
binding  to  the  chaperonin.  The  possibility  that  GroEL  can  remove 
non-native  proteins  from  kinetic  traps  by  unfolding  them  either 
during  polypeptide  binding  to  the  chaperonin  or  during  the 
subsequent  ATP-dependent  formation  of  folding-active  complexes 
with  the  co-chaperonin  GroES  has  also  been  explored. 
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Abbreviations 

AcP  acylphosphatase 

Ada2h  activation  domain  of  procarboxypeptidase 
CO  contact  order 

EDTA  ethylenediamine  tetra-acetic  acid 
GFP  green  fluorescent  protein 

MDH  malate  dehydrogenase 

Rubisco  ribulose-1 ,5-bisphosphate  carboxylase-oxygenase 
SH  Src  homology 

TFE  trifluoroethanol 

Introduction 

Two  aspects  of  protein  folding  mechanisms  are  considered  in 
this  review:  recent  insights  into  the  folding  behavior  of  small 
two-state  folding  proteins  and  the  action  of  the  chaperonin 
GroEL  in  assisting  the  folding  of  larger  proteins. 

Folding  of  small  proteins 

The  past  several  years  have  witnessed  a  rapid  increase  in 
the  amount  of  experimental  data  on  the  folding  of  small 
single-domain  proteins.  Comparison  of  results  on  sets  of 
both  homologous  and  unrelated  proteins  has  provided  con¬ 
siderable  insight  into  the  determinants  of  the  folding 
process.  In  this  part  of  the  review,  we  present  simple  mod¬ 
els  that  incorporate  recent  experimental  findings  and 
appear  to  capture  the  broad  outlines  of  the  folding  process. 
An  important  feature  of  these  models  is  that  the  folding 
free  energy  landscape  is  dominated  by  the  trade-off 


between  the  unfavorable  loss  in  configurational  entropy 
upon  folding  and  the  gain  in  attractive  native  interactions; 
non-native  interactions  are  assumed  not  to  play  a  signifi¬ 
cant  role.  As  will  be  discussed  first,  recent  results  suggest  a 
picture  in  which  several  different  routes  through  the  free 
energy  landscape  w'ith  roughly  equivalent  free  energy  bar¬ 
riers  can  be  consistent  with  the  overall  topology 
(low-resolution  structure)  of  a  protein  and  sequence 
changes  can,  by  lowering  or  raising  one  barrier  relative  to 
another,  produce  significant  changes  in  the  transition-state 
ensemble  without  large  changes  in  the  folding  rate. 
Because  our  recent  articles  have  probably  overly  empha¬ 
sized  the  role  of  native  state  topology  [1-3],  we  shall 
subsequently  focus  our  attention  on  several  examples  that 
illustrate  how  variations  in  local  free  energies  of  ordering 
can  modulate  the  folding  process. 

We  begin  by  considering  a  zeroth  order  model  in  which  all 
native  interactions  in  a  protein  are  equally  favorable  (i.e. 
homogeneous  contact  model).  In  such  a  model,  the  free 
energy  cost  of  forming  different  contacts  in  a  protein 
depends  solely  on  the  entropic  cost  of  restricting  the  chain 
to  allow  the  contact.  This  entropic  cost  increases  with 
increasing  sequence  separation  between  the  interacting 
residues,  as  more  of  the  chain  must  be  constrained  in  order 
to  form  the  contact.  When  many  of  the  contacts  in  a  pro¬ 
tein  are  between  residues  distant  in  the  primary  sequence, 
a  large  portion  of  the  chain  must  be  ordered  before  even  a 
few  favorable  contacts  can  form,  leading  to  a  large  folding 
free  energy  barrier.  Conversely,  when  interacting  residues 
are  close  in  the  protein  sequence,  the  entropic  cost  of 
chain  ordering  is  partially  compensated  by  the  formation 
of  contacts  earlier  in  the  folding  process,  leading  to  a 
smaller  folding  free  energy  barrier.  Therefore,  in  this  very 
simple  model,  one  expects  proteins  with  most  of  their  con¬ 
tacts  between  residues  close  in  the  sequence  to  fold  faster 
than  proteins  with  contacts  between  residues  distant  in 
the  sequence. 

Several  years  ago,  we  found  such  a  relationship  between 
folding  rate  and  the  average  sequence  separidDn  between 
contacting  residues  (the  contact  order  —  CO)  [1].  A  con¬ 
siderable  number  of  proteins  have  been  studied  in  the 
interim  period  and  an  updated  version  of  the  plot,  encom¬ 
passing  all  the  two-state  folding  proteins  that  have  been 
kinetically  characterized  (Table  1),  shows  an  even  stronger 
correlation  between  CO  and  rate  of  folding  (Figure  la). 
The  correlation  is  particularly  remarkable  because  of  the 
very  wide  variation  in  the  folds  and  functions  of  these  pro¬ 
teins.  It  suggests  that  the  low-resolution  structure  or 
topology  of  a  protein  is  a  major  determinant  of  the  trade¬ 
off  between  configurational  entropy  loss  and  formation  of 
attractive  interactions,  as  suggested  by  the  simple  model 
described  in  the  previous  paragraph.  The  correlation  also 


Mechanisms  of  protein  folding  Grantcharova  et  al.  71 


supports  the  assumption  that  non-native  interactions  play 
a  relatively  minor  role  in  shaping  the  folding  process  as, 
unlike  native  interactions,  they  are  not  expected  to  be 
related  to  the  native  structure. 

In  the  simple  zeroth  order  model  discussed  above,  increas¬ 
ing  uniformly  the  strength  of  all  interactions  clearly 
reduces  the  free  energy  barrier  to  folding  (the  unfavorable 
entropy  of  ordering  is  better  compensated  by  the  forma¬ 
tion  of  the  more  favorable  interactions)  and  the  folding  rate 
increases.  Thus,  for  a  given  protein,  reducing  the  strength 
of  the  favorable  interactions  (i.e.  reducing  stability)  is 
expected  to  reduce  the  folding  rate.  Indeed,  there  is  a 
nearly  linear  correlation  between  folding  rate  and  stability 
for  a  given  protein  upon  changes  in  solution  conditions, 
most  notably  upon  the  addition  of  denaturant.  Also, 
within  a  protein  family,  more  stable  proteins  generally  fold 
more  rapidly  than  less  stable  proteins  [4,5].  However,  the 
correlation  between  stability  and  folding  rate  for  proteins 
with  different  folds  is  much  weaker  than  that  between  CO 
and  folding  rate,  consistent  with  the  dominant  role  of 
native  state  topology  in  determining  folding  rates  [2]. 

Interestingly,  there  is  a  better  correlation  between  the 
folding  rate  and  the  relative  CO  (average  sequence  separa¬ 
tion  divided  by  chain  length)  than  between  the  folding 
rate  and  the  absolute  (unnormalized)  CO  (compare 
Figure  la,b).  This  is  somewhat  unexpected  as  the  ^ntropic 
cost  of  contact  formation  is  a  function  of  the  absolute  CO, 
rather  than  of  the  relative  CO,  and  simple  models  of  the 
sort  discussed  above  predict  relationships  with  the 
absolute  CO.  If  the  improved  correlation  with  the  relative 
CO  is  borne  out  by  further  experimental  data  over  the  next 
several  years,  it  may  be  necessary  to  consider  models  in 
which  there  is  a  renormalization  that  removes  the  depen¬ 
dence  on  the  absolute  length  of  the  protein.  An  alternative 
possibility  is  that,  for  the  proteins  in  this  set,  the  stability 
increases  with  increasing  length  and  dividing  by  the  length 
accounts  for  the  effect  of  stability  on  the  folding  rate,  albeit 
in  a  somewhat  indirect  way. 

We  frequently  encounter  two  questions  about  the  contact 
order/folding  rate  correlation.  First,  given  that  the  entropic 
cost  of  closing  a  loop  in  a  protein  is  proportional  to  the  log¬ 
arithm  of  the  loop  length,  shouldn’t  folding  rates  be  more 
closely  correlated  to  the  logarithm  of  the  CO?  As  shown  in 
Figure  Ic,  because  of  the  limited  range  of  the  CO  values, 
the  relationship  between  folding  rates  and  log  CO  is  nearly 
indistinguishable  from  that  between  folding  rates  and  CO. 
Second,  as  the  magnitude  of  the  entropic  barrier  to  folding 
depends  on  the  CO  of  the  folding  transition-state  ensem¬ 
ble,  why  is  there  a  correlation  between  folding  rates  and 
the  CO  of  the  native  structure?  The  correlation  suggests 
that  the  CO  of  the  native  structure  is,  in  turn,  correlated 
with  that  of  the  transition-state  ensemble;  this  is  not  sur¬ 
prising  given  that  a  reasonable  fraction  of  the  native 
structure  is  usually  formed  in  the  transition-state  ensemble 
and  that  contact  lengths  tend  to  be  relatively  consistent 


Table  1 


Rates  of  folding  for  two-state  folding  proteins. 


Protein* 

Log(kf)' 

co: 

(%) 

AGu  Lengthlj 

(kcal/mol)  (residues) 

Temperature 

(C») 

Cyt*B562  [62] 

5.30 

7.47 

10.0 

106 

20 

Myoglobin 

4.83“ 

8.50 

8.4 

154 

25 

X-repressor  [63] 

4.78 

9.37 

5.6 

80 

20 

PSBD  [64] 

4.20 

11.20 

2.2 

41 

41 

Cyt-c  [65] 

3.80“ 

11.22 

8.2 

104 

23 

Im9  [66] 

3.16 

12.07 

6.6 

85 

10 

ACBP  [67] 

2.85 

13.99 

8.2 

86 

25“ 

Villin  14T  [68] 

3.25 

12.31 

9.8 

126 

25 

N-term  L9  [69] 

2.87 

12.74 

4.5 

56 

25 

Ubiquitin  [70] 

3.19 

15.11 

7.2 

76 

25 

CI2  [71] 

1.75 

16.40 

7.6 

64 

25 

U1A[72] 

2.53 

16.91 

9.9 

102 

25 

Ada2h  [73] 

2.88 

16.96 

4.1 

79 

25 

Protein  Q  [74] 

2.46 

17.30 

4.6 

56 

25 

Protein  L  [75] 

1.78 

17.62 

4.6 

62 

22 

FKBP  [76] 

0.60 

17.70 

5.5 

107 

25 

HPr  [77] 

1.17 

18.35 

4.7 

85 

20 

MerP  [78] 

0.26^ 

18.90 

3.4 

72 

25 

mAcP  [79] 

-0.64 

21.20 

4.5 

98 

28 

CspB  [4] 

2.84 

16.40 

2.7 

67 

25 

TNfn3  [80] 

0.46 

17.35 

5.3 

92 

20 

Tl  127  [80] 

1.51 

17.82 

7.5 

89 

25 

Fyn  SH3  [5] 

1.97 

18.28 

6.0 

59 

20 

Twitchin  [80] 

0.18 

19.70 

4.0 

93 

20 

PsaEW 

0.51 

17.01 

1.57 

69 

22 

Sso7d»» 

3.02 

9.54 

'5.93 

63 

25 

*A  nonhomologous  set  of  simple,  single-domain,  non-disulfide-bonded 
proteins  that  have  been  reported  to  fold  via  two-state  kinetics  under  at 
least  some  conditions.  Reported  data  and  representative  members  of 
homologous  families  selected  as  previously  described  [1]. 
t" Extrapolated  folding  rates  in  water.  May  differ  from  true  folding  rate  in 
water  (e.g.  cyt-c,  protein  G,  ubiqutin  and  others)  due  to  'roll-over'  at 
low  denaturant  concentrations.  ^Calculated  as  previously  described 
[1].  ^Length  of  protein  in  residues  from  first  structured  residue  to  last. 
May  differ  from  number  of  residues  in  construct  characterized.  ^As 
reported  previously  in  [2].  (a)  P  Bowers,  D  Baker,  unpublished  data. 

(b)  L  Serrano,  personal  communication. 

within  particular  protein  structures  (in  an  all-helical  pro¬ 
tein,  the  contact  lengths  are  consistently  shorter  than  in  a 
parallel  p-sheet  protein,  for  example). 

In  the  simple  zeroth  order  model,  protein  topology  is  the 
single  most  important  determinant  of  the  folding  process 
because  it  determines  the  sequence  separation  and  spa¬ 
tial  arrangement  of  the  contacting  Residues.  Indeed, 
simple  computational  models  based  on  the  homogeneous 
contact  picture  have  done  reasonably  well  at  capturing 
many  of  the  overall  features  of  protein  folding  rates  and 
mechanisms  [6-9].  However,  there  are  now  a  number  of 
examples  in  which  differences  in  local  free  energies  of 
ordering  have  a  significant  influence  on  the  folding 
mechanism,  particularly  in  cases  in  which  several  differ¬ 
ent  pathways  are  equally  consistent  with  the  structure 
because  of  symmetry  (see  below).  These  differences  may 
arise,  for  example,  from  particularly  unfavorable  local 
conformations  that  either  are  important  for  functional 
reasons  or  are  compensated  in  the  final  folded  structure 
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Figure  1 


(a)  fi 

(b)  R 

('=)  6-, 

X 

'  '  o- 

X 

X 

5- 

X  X 

5- 

X  X 

5- 

X  X 

4- 

X 

X 

4- 

X 

X 

4- 

X 

^  X 

XX  * 

^  X 

.-s  3- 

X  X  XX 

2  3- 

X  „ 

X  X  X  X 

^  3- 

*  X  X  XX 

Xx 

O) 

X  *■ 

^  2- 

X 

^  2- 

X 

XX 

^  2^ 

X 

X  X 

X 

X 

X 

1- 

X 

1  - 

X 

1- 

* 

»  X 

*x 

J 

V  X 

X  X 

0- 

X  X 

0- 

X  X 

0- 

Xx 

X 

X 

X 

— 1— 

-1- 

c 

- , - [ - 1 - J - 1 - 1 - 1 

>  10  15  20  2 

— ‘I— 
5  C 

- 1 - ; - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 

)  5  10  15  20  2 

5  0. 

1  ■  1  .  1  .  1  .  ,  .  ,  . 

,8  0.9  1  1.1  1.2  1.3  1. 

4 

Relative  CO  (%) 

Absolute  CO 

log(relative  CO) 

Current  Opinion  in  Structural  Biology 

Correlation  between  the  logarithm  of  the  folding  rate  and  (a)  relative  CO,  (b)  absolute  CO  and  (c)  log(relative  CO). 


by  particularly  favorable  nonlocal  interactions. 
Incorporation  of  these  differences  leads  to  a  model  in 
which  the  order  of  events  in  folding  depends  both  on  the 
overall  topology  and  on  the  relative  free  energy  of  order¬ 
ing  different  parts  of  the  chain.  Given  two  possible  routes 
to  the  native  state,  which  involve  forming  contacts 
between  residues  equally  distant  along  the  chain,  the 
lowest  free  energy  route  is  that  involving  the  formation  of 
the  lowest  free  energy  substructures.  Such  a  model  pro¬ 
duces  considerably  better  predictions  of  the  folding  rate 
and  of  the  dominant  features  of  the  structure  of  the  fold¬ 
ing  transition-state  ensemble  than  the  simple  zeroth 
order  model  (see  Figures  2  and  3;  E  Aim,  A  Morozov, 
D  Baker,  unpublished  data). 

Experimentally,  the  distribution  of  structure  in  the  folding 
transition  state  can  be  determined  by  measuring  the  effect 
of  mutations  throughout  the  protein  on  the  folding  and 
unfolding  rate  [10].  Fersht’s  value  notation  is  a  conve¬ 
nient  way  to  summarize  such  data;  a  O  value  of  one 
indicates  that  the  interactions  made  by  a  residue  are  as 
ordered  in  the  transition  state  as  in  the  native  state,  whereas 
a  O  value  of  zero  indicates  that  the  interactions  are  not 
formed  in  the  transition  state  [11].  Table  2  summarizes  the 
general  properties  of  the  folding  transition  states  studied 
so  far  using  this  kind  of  analysis.  The  following  focuses  on 
several  recent  examples  that  highlight  the  interplay 
between  the  native  state  topology  and  variations  in  local 
free  energies  of  ordering  in  determining  the  folding  mech¬ 
anism  (this  is  not  a  comprehensive  summary  of  recent 
advances  in  protein  folding  studies). 

GCN4  and  X  repressor 

The  GCN4-pl  coiled  coil  is  a  particularly  simple  system 
for  the  detailed  examination  of  the  effects  of  topology  and 
local  structural  propensity  on  the  distribution  of  structure 


in  the  transition-state  ensemble.  The  rate-limiting  step  in 
folding  involves  the  association  of  two  monomers  to  form 
a  dimer  in  which  hydrophobic  residues  are  partially 
buried,  but  the  helices  are  not  completely  formed.  The 
C-terminal  region  of  the  helix  exhibits  higher  helix 
propensity  and  mutations  in  that  region  have  larger  effects 
on  the  folding  rate  than  mutations  in  the  N  terminus 
[12,13].  Interestingly,  the  effect  of  mutations  on  the  fold¬ 
ing  rate  can  be  altered  by  manipulating  the  helix 
propensity  throughout  the  helix  with  the  help  of  additional 
mutations.  For  example,  once  the  N  terminus  of  the  helix 
is  stabilized  by  two  alanine  substitutions,  a  subsequent 
mutation  at  the  C  terminus  has  a  relatively  small  effect  on 
folding,  and  when  the  C  terminus  is  destabilized  by  a 
glycine  substitution,  a  subsequent  mutation  at  the  N  ter¬ 
minus  has  a  much  larger  effect  on  folding  than  in  the 
wild-type  protein  [12].  Thus,  whereas  in  the  wild-type 
protein  the  rate-limiting  step  appears  to  involve  primarily 
the  association  of  C-terminal  portions  of  the  two  helices 
[13],  association  of  the  N-terminal  regions  can  nucleate 
folding  if  the  N  terminus  is  stabilized  or  the  C  terminus  is 
destabilized.  Such  malleability  is  expected  given  the  sym¬ 
metry  of  the  helix  —  it  appears  that  the  rate-limiting  step 
involves  the  pairing  of  helical  regions  of  the  two 
monomers,  but  whether  these  are  G-terminal  or  N-termi¬ 
nal  depends  on  the  details  of  the  sequence  and  can  be 
perturbed  by  mutations  that  alter  the  helix  propensity. 
However,  when  the  symmetry  is  broken  by  connecting  the 
N  termini  of  the  helices  with  a  covalent  cross-link,  the  por¬ 
tions  of  the  helices  adjacent  to  the  (N-terminal)  cross-link 
are  largely  formed  and  the  C-terminal  regions  are  largely 
disrupted  in  the  transition  state,  regardless  of  the  intrinsic 
helical  propensities  [12].  Therefore,  in  this  system,  local 
structural  biases  have  some  influence  on  the  transition 
state  when  multiple  folding  routes  are  equally  consistent 
with  the  overall  topology  because  of  symmetry  (the  dimeric 
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Table  2 


Folding  transition  states  characterized  by  mutational  analysis. 


Protein  Fold  Number  of  residues  Number  of  mutants  Transition  state  (TS)  characteristics 


X  Repressor 

a  helix 

80 

8 

ACBP 

a  helix 

86 

26 

GCN4  coiled  coil 

a  helix 

Monomer 

72 

3 

Dimer 

36/36 

3 

src  SH3  domain 

P  barrel 

57 

57 

a-Spectrin  SH3  domain 

62 

17 

PsaE 

p  barrel 

69 

18 

Sso7d 

63 

24 

Simplified  SH3 

56 

5 

src  SH3  circ 

P  barrel 

57 

14 

src  SH3  cross 

57 

9 

Spectrin  SH3  perm1 

P  barrel 

62 

7 

Spectrin  SH3  perm2 

62 

8 

TNfn3 

P  sandwich 

92 

48 

Ada2h 

a/p 

81 

15 

AcP 

(pappapp) 

98 

26 

U1A 

102 

13 

S6 

101 

? 

Protein  L 

a/p 

62 

70 

Protein  G 

(ppapp) 

57 

19 

Protein  G_Nu 

57 

4 

CI2 

o/p 

64 

150 

C12  circ 

o/p 

64 

11 

CI2  perm 

64 

11 

CI2  frag 

40/24 

23 

FKBP 

o/p 

107 

34 

CheY 

a/p 

129 

34 

p13suc1 

o/p 

113 

57 

Arc  repressor 

o/p 

53 

44 

Some  helices  are  more  structured  in  the  TS  than  others;  multiple 
folding  pathways  were  postulated  because  of  the  dramatic  effect 
of  single  mutations  and  temperature  on  TS  structure  [14,15] 

Terminal  helices  come  together  in  the  TS,  while  the  rest  of  the 
protein  is  involved  in  non-native  interactions;  conserved 
hydrophobic  residues  are  important  in  the  TS  [67] 

TS  for  coiled-coil  formation  is  different  when  the  two  helices  are 
cross-linked  and  when  they  form  a  dimer  [1 2,1 3] 

TS  is  structurally  polarized,  with  part  of  the  protein  fully  formed 
and  the  rest  fully  disordered;  TS  is  conserved  among  distant 
sequence  homologs  [3,22] 

These  proteins  are  structural  homologs  of  the  SH3  domain,  but 
do  not  exhibit  the  same  TS  (P  Bowers,  D  Baker,  unpublished 
data;  L  Serrano,  personal  communication;  Q  Yi,  D  Baker, 
unpublished  data) 

Circularization  (circ)  makes  the  TS  more  delocalized,  whereas 
cross-linking  (cross)  of  the  distal  hairpin  leaves  it  unchanged  [25] 

Permutation  at  the  distal  hairpin,  but  not  at  the  RT  loop,  causes  a 
shift  In  the  structure  of  the  TS  [81] 

Structurally  polarized:  a  ring  of  core  residues  from  the  central 
P  strands  forms  the  folding  nucleus,  while  the  terminal  strands 
are  disordered  [82] 

The  topology  of  this  fold  allows  several  different  TSs,  depending 
on  which  helix  is  more  structured  [1 9-21 ,73] 


The  symmetric  topology  of  the  protein  allows  for  two  possible 
TSs,  depending  on  which  hairpin  is  more  stable;  stabilizing  the 
opposite  hairpin  leads  to  a  switch  in  the  transition  state 
(protein  G_Nu);  ([16,17];  S  Nauli,  B  Kuhlman,  D  Baker, 
unpublished  data) 

Delocalized  TS,  with  most  of  the  interactions  only  partially 
formed  [71] 

Circularization  (circ),  circular  permutation  (perm)  and 
fragmentation  (frag)  do  not  change  the  delocalized  TS  [83] 

[76,84] 

[85] 

[86] 

Delocalized  TS  [87] 


case).  However,  when  the  topology  strongly  favors  one 
particular  route  to  the  native  state  because  of  the  reduced 
entropic  cost  of  forming  more  local  interactions  (the 
monomeric  case),  secondary  structure  propensities  are  of 
little  consequence. 

The  X  repressor,  another  all  a-helical  protein,  has  also 
been  postulated  to  fold  by  a  number  of  pathways,  depend¬ 
ing  on  the  intrinsic  stability  of  each  helix.  Both  point 
mutations  [14]  and  temperature  [15]  have  been  shown  to 
significantly  change  structure  in  che  transition  state. 


Protein  L  and  protein  G 

Protein  L  and  protein  G  are  structural  homologs,  but  have 
little  detectable  sequence  similarity.  Both  proteins  consist 
of  an  a  helix  packed  across  a  four-stranded  sheet  formed  by 
two  symmetrically  disposed  (3  hairpins.  Remarkably,  the 
symmetry  of  the  fold  is  almost  completely  broken  during 
folding:  in  protein  L,  che  first  hairpin  is  formed  and  the  sec¬ 
ond  disrupted  at  the  rate-limiting  step  in  folding,  whereas 
in  protein  G,  che  second  hairpin  is  formed  and  the  first  is 
disrupted  [16,17]  (Figure  2).  Thus,  despite  the  small  size 
(-60  residues)  of  the  two  proteins  and  their  topological 
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Folding  transition  states  of  (a)  protein  G  and  (b)  protein  L  Left,  predicted 
phi  values;  right,  experimental  phi  values.  The  color  scheme  is  continuous 
from  red  (0=1;  structured  in  the  transition  state)  to  blue  (0  =  0; 
unstructured  in  the  transition  state).  Sites  not  probed  experimentally  are 
indicated  in  white.  Graphics  were  generated  with  molscript  [88]. 

Predicted  phi  value  distributions  were  obtained  from  the  highest  free 
energy  configurations  along  the  lowest  free  energy  paths  between  the 
unfolded  and  native  states,  as  described  in  [6],  except  that  additional 
terms  representing  hydrogen  bonding  and  local  sequence/structure 
preferences  were  included  in  the  free  energy  function.  The  second 
p  hairpin  is  favored  by  the  computational  model  for  protein  G,  because  of 
an  extensive  hydrogen-bond  network,  and  the  first  hairpin  is  favored  by 
the  model  for  protein  L,  because  the  second  p  turn  has  considerable 
torsional  strain  (three  consecutive  residues  with  positive  phi  angles). 

symmetry,  there  is  a  definite  hierarchy  to  structure  forma¬ 
tion.  The  characterization  of  the  two  transition  states 
suggests  that  the  lowest  free  energy  route  to  the  native 
state  for  this  fold  involves  formation  of  one  of  the  two 
P  hairpins;  however,  the  choice  of  hairpin  is  determined  by 
factors  beyond  native  state  topology.  Interestingly,  with  the 
addition  of  hydrogen  bonding  and  sequence-  and  structure- 
dependent  local  free  energies  of  ordering,  the  simple 
computational  model  described  above  [6]  recapitulates  the 
experimentally  observed  symmetry  breaking  (Figure  2). 

The  correspondence  between  the  predicted  and  experimen¬ 
tally  determined  phi  values  suggests  that  the  hairpin  formed 


at  the  rate-limiting  seep  is  the  one  with  the  lowest  free 
energy  of  formation.  To  test  this  hypothesis,  computational 
protein  design  methods  [18]  have  recently  been  used  to 
specifically  stabilize  the  first  p  hairpin  of  protein  G,  which,  as 
noted  above,  is  not  formed  in  the  transition  state  in  the  wild- 
type  protein.  A  redesigned  protein  G  variant  with  a  more 
optimal  backbone  conformation  and  sequence  in  the  first 
hairpin  folds  100-fold  faster' than 'the  wild-type  protein. 
Subsequent  mutational  analysis  shows  that  the  first  p  hair¬ 
pin,  rather  than  the  second  P  hairpin  (as  in  the  wild-type),  is 
formed  in  the  transition  state  in  the  redesigned  protein 
(S  Nauli,  B  Kuhlman,  D  Baker,  unpublished  data).  Likewise, 
following  stabilization  by  redesign  of  the  second  hairpin  of 
protein  L,  which  contains  three  consecutive  residues  with 
positive  phi  angles  in  the  wild-type  structure,  and  destabi¬ 
lization  of  the  first  hairpin,  the  second  hairpin  was  found  to 
be  better  formed  in  the  folding  transition-state  ensemble 
than  the  first  turn  (D  Kim,  B  Kuhlman,  D  Baker,  unpub¬ 
lished  data).  These  switches  in  folding  mechanism  highlight 
the  differences  local  free  energies  of  ordering  can  have  when 
the  overall  topology  has  considerable  symmetry. 

AeP,  Ada2h,  UlAand  S6 

The  folding  transition  states  of  four  proteins  with  the 
ferredoxin-like  fold  (two  helices  packed  against  one  side  of 
a  five-stranded  p  sheet)  have  been  characterized.  The 
folding  transition  states  of  AdaZh  (activation  domain  of 
procarboxypeptidase)  and  AeP  (acylphosphatase)  are  simi¬ 
lar,  despite  the  low  sequence  similarity  (13%)  between  the 
two  proteins  and  variations  in  the  length  of  the  secondary 
structural  elements  [19,20].  In  both  cases,  the  overall 
topology  of  the  protein  appears  to  be  already  specified  in 
the  transition  state,  but  the  second  a  helix  and  the  inside 
strands  of  the  p  sheet  with  which  it  interacts  appear  to  be 
more  ordered  than  the  rest  of  the  polypeptide  chain.  The 
characterization  of  two  other  members  of  this  structural 
family,  however,  revealed  an  alternative  nucleus  with  pref¬ 
erential  structure  around  helix  1:  UlA  nucleates  in  helix  1 
and  S6  nucleates  in  both  helices  [21].  The  topology 
appears  to  allow  several  roughly  equivalent  folding  path¬ 
ways:  the  choice  of  the  dominant  pathway  may  be 
determined  by  the  detailed  packing  and  orientation  of 
structural  elements.  Proteins  with  this  fold  also  exhibit  a 
pronounced  movement  of  the  transition  state  from  20%  to 
80%  native  (as  judged  by  the  burial  of  surface  area)  with 
increasing  concentration  of  denaturant.  Remarkably,  given 
the  variation  in  the  transition-state  structure,  the  folding 
rates  of  these  proteins  are  highly  correlated  with  the  CO 
over  an  approximately  4000-fold  range  of  folding  rates. 
Furthermore,  changing  the  CO  can  significantly  change 
the  folding  rate:  a  circular  permutant  of  UlA  with  CO 
lower  than  that  of  the  wild-type  protein  folds  considerably 
faster  (M  Oliveberg,  personal  communication). 

SH3  domain  fold 

SH3  family 

The  homologous  sre  and  a-spectrin  SH3  domains  exhibit 
very  similar  transition  states  [3,22-24],  despite  the  low 
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sequence  identity  (36%)  (Figure  3a, b).  Stabilizing  muta¬ 
tions  [23]  and  changes  in  pH  [22]  do  not  seem  to  affect  the 
structure  of  the  transition  state  of  the  a-spectrin  SH3 
domain.  In  the  case  of  the  src  SH3  domain,  stabilization  of 
local  structure  by  hairpin  cross-linking  and  global  stabiliza¬ 
tion  by  sodium  sulfate  do  not  alter  the  placement  of  the 
transition  state  along  the  reaction  coordinate  [25].  It 
appears,  then,  that  SH3  domains  allow  quite  large  varia¬ 
tions  in  sequence  and  experimental  conditions  with  no 
change  to  the  transition  state,  probably  because  there  are 
no  alternative  structural  elements  that  can  be  sufficiently 
stabilized  to  become  folding  nuclei.  On  the  other  hand, 
modifying  the  topology  of  the  protein  can  significantly 
change  the  free  energy  landscape  to  favor  alternative 
routes  for  folding.  Circularization  of  the  src  SH3  domain 
causes  the  delocalization  of  structure  in  the  transition  state 
[25].  Circular  permutation  experiments  on  the  a-spectrin 
SH3  domain  also  changed  the  transition  state  [26]. 
Connecting  the  wild-type  termini  with  a  small  peptide 
linker  and  introducing  a  cut  in  the  distal  hairpin  resulted  in 
a  shift  in  the  structure  of  the  transition  state  towards  the 
n-src  loop  and  the  hairpin  formed  by  the  old  termini;  the 
former  distal  hairpin  was  completely  disordered  at  the  rate- 
limiting  step.  Therefore,  shifts  in  transition-state  structure 
can  occur  when  formerly  distant  elements  are  covalently 
linked  to  reduce  the  entropic  cost  of  their  interaction. 
Drastic  mutagenesis,  which  weakens  the  interaction  ener¬ 
gies  throughout  the  protein,  can  also  change  the  transition 
state.  For  example,  a  sequence-simplified  mutant  of  the 
src  SH3  domain  made  predominantly  of  five  amino  acids 
(isoleucine,  lysine,  glutamic  acid,  alanine  and  glycine)  was 
found  to  have  a  more  delocalized  transition  state  (distal 
hairpin  is  not  fully  formed);  the  interactions  stabilizing  the 
wild-type  SH3  transition  state  may  not  be  strong  enough  in 
the  simplified  mutant  to  overcome  the  loss  in  entropy  and 
residues  from  other  parts  of  the  protein  may  have  to 
participate  (Q  Yi,  D  Baker,  unpublished  data). 

SH3  structural  analogs 

The  characterization  of  SH3  structural  analogs  has  shown 
that  transition-state  structure  is  not  always  conserved  in 
proteins  with  similar  topologies.  PsaE  [27],  a  photosystem 
protein  from  cyanobacteria,  has  a  large  loop  insertion  at 
the  distal  hairpin  (13  amino  acids),  making  it  entropically 
more  costly  to  form  stabilizing  interactions.  As  a  result,  its 
transition  state  is  more  delocalized  than  that  of  the  src  SH3 
domain,  with  well-ordered  residues  found  in  the  distal 
hairpin,  as  well  as  in  the  N  and  C  termini  (P  Bowers, 
D  Baker,  unpublished  data)  (Figure  3d).  Sso7d,  a  DNA- 
binding  protein  from  Stdfolobus  solfataricus  [28],  has  a 
significantly  different  transition  state  from  that  of  the  src 
and  a-spectrin  SH3  domains.  The  n-src  loop  and  the 
C  terminus  (which  is  a  helix  in  Sso7d,  instead  of  a  P  strand) 
are  the  most  structured  in  the  transition  state,  whereas  the 
distal  hairpin  is  only  weakly  ordered  (R  Guerois, 
L  Serrano,  personal  communication)  (Figure  3c).  This  is  in 
contrast  to  the  src  and  a-spectrin  SH3  transition  states,  in 
which  the  distal  hairpin  is  completely  ordered.  In  the  SH3 


Figure  3 
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Folding  transition  states  of  proteins  with  the  SH3  fold:  (a)  src  SH3 
domain,  (b)  spectrin  SH3  domain,  (c)  Sso7d  and  (d)  PsaE.  Left, 
predicted  phi  values  (see  legend  to  Figure  2);  right,  experimental  phi 
values.  The  color  scheme  is  described  in  the  legend  to  Figure  2.  The 
distal  loop  is  favored  over  the  n-src  loop  by  the  computational  model 
for  the  src  SH3  domain  because  it  has  more  extensive  hydrogen 
bonding,  whereas  the  equivalent  of  the  distal  loop  is  disfavored  by  the 
model  for  Sso7d  because  it  contains  five  glycine  residues  that  are 
costly  to  order. 


domains  and  in  Sso7d,  the  contiguous  three-stranded 
sheet  is  formed  but,  in  one  case,  the  diverging  turn  inter¬ 
acts  with  it,  whereas  in  the  other  case,  it  is  the  C-terminal 
helix.  This  difference  may  reflect  variations  in  the  free 
energies  of  forming  the  structural  elements:  in  the  SH3 
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domains,  the  distal  loop  hairpin  is  well  packed  and  the 
n-src  loop  is  irregular,  whereas  in  Sso7d,  the  opposite  is  the 
case  —  the  equivalent  ot  the  distal  hairpin  contains  five 
consecutive  glycine  residues  (which  are  likely  to  be  func¬ 
tionally  important).  With  the  inclusion  of  hydrogen 
bonding  and  sequence-  and  structure-dependent  local  free 
energies  of  ordering,  the  simple  computational  model 
described  above  [6]  produces  phi  values  very  similar  to 
those  observed  experimentally  for  the  SH3  domains  and 
wSso7d.  Similar  results  were  very  recently  published  by 
Guerois  and  Serrano  (R  Guerois,  L  Serrano,  unpublished 
data;  see  Now  published). 

In  summary,  folding  transition-state  structure  is  conserved 
more  highly  within  the  SH3  sequence  superfamily  than 
among  SH3  analogs.  The  SH3  topology,  then,  although 
not  as  obviously  symmetric  as  the  protein  L/protein  G 
topology,  still  allows  several  alternative  folding  routes. 
The  prevalence  of  one  route  over  the  other  depends  on 
the  details  of  the  structure.  This  may,  in  part,  be  due  to 
the  fact  that  functional  constraints  lead  to  the  conserva¬ 
tion  within,  but  not  between,  superfamilies  of  portions  of 
protein  structures  with  unusual  local  features  (the  irregu¬ 
lar  n-src  and  RT  loops  in  the  SH3  domain,  for  example, 
are  involved  in  proline-rich  peptide  binding)  with  higher 
free  energies  of  formation.  These  features  partially  deter¬ 
mine  which  of  the  pathways  consistent  with  the  native 
state  topology  is  actually  chosen. 

The  GCN4  and  protein  G  experiments,  together  with  the 
comparisons  of  transition-state  structures  in  the  AcP  and 
SH3  families,  suggest  a  picture  in  which  several  different 
'pathways’  with  roughly  equivalent  free  energy  barriers 
can  be  consistent  with  the  overall  topology.  Sequence 
changes  can,  by  lowering  or  raising  one  barrier  relative  to 
another,  produce  significant  changes  in  the  transition- 
state  ensemble  without  large  changes  in  folding  rate. 
Consistent  with  this  picture,  our  most  recent  models  of 
the  folding  process  produce  considerably  more  accurate 
predictions  of  folding  transition-state  structures  when 
local  free  energies  of  ordering  based  on  sequence-depen- 
dent  backbone  torsion  angles  and  local  hydrogen  bonding 
terms  are  included.  We  anticipate  considerable  synergy 
between  theory  and  experiment,  and  an  important  role  for 
computational  protein  design  methods  in  the  further  elu¬ 
cidation  of  the  mechanisms  of  protein  folding  during  the 
next  few  years. 

GroEL-GroES-assisted  folding 

How  do  the  foregoing  simple  concepts  apply  to  chaperone- 
assisted  folding.^  In  small  proteins,  the  largest  free  energy 
barriers  to  folding  involve  the  formation  of  particularly 
nonlocal  portions  of  protein  structures  and  regions  with 
particularly  unfavorable  local  energetics.  It  seems  possible, 
therefore,  that  larger  proteins  containing  such  features  may 
be  particularly  dependent  on  chaperones  for  suppressing 
alternative  off-pathway  misfolding/aggregution.  Kinetic 
bottlenecks  caused  by  unfavorable  local  structures  or  high 


contact  order  regions  may  tilt  the  kinetic  competition 
between  on-  and  off  -pathway  reactions  in  favor  of  the  lat¬ 
ter.  It  should  be  emphasized,  however,  that  non-native 
interactions  are  likely  to  play  a  greater  role  in  the  folding 
of  larger  proteins  simply  because  the  increased  size  of  the 
protein  increases  the  probability  of  low  tree  energy  non¬ 
native  interactions.  Chaperones  act  on  such  non-native 
states  in  the  first  instance  by  binding  the  hydrophobic  sur¬ 
faces  that  are  exposed,  preventing  these  surfaces  from 
'wrongful  interactions’  that  lead  to  multimolecular  aggre¬ 
gation.  Binding  may,  in  some  cases,  be  associated  also  with 
at  least  partial  unfolding,  as  discussed  below  tor  GroEL. 
Release  from  the  chaperones,  in  many  cases  driven  by 
ATP  binding  (not  hydrolysis),  then  allows  the  substrate 
polypeptide  a  chance  to  fold.  Uniquely,  in  the  case  of  the 
chaperonin  ring  class  of  chaperones,  polypeptide  is 
released  into  an  encapsulated  chamber  where  folding  pro¬ 
ceeds  in  isolation.  In  the  case  of  the  bacterial  chaperonin, 
GroEL,  this  is  mediated  by  ATP/GroES  binding,  which  is 
associated  with  rigid-body  movements  of  the  GroEL  inter¬ 
mediate  and  peptide-binding  apical  domains  of  the  bound 
ring  [29]  (see  Figure  4),  The  60^*  elevation  and  90^  twisting 
of  the  apical  domains  act  to  remove  the  hydrophobic  pep¬ 
tide-binding  sites  away  from  the  central  cavity,  releasing 
polypeptide  into  this  GroES-encapsulated  space.  Because 
the  character  of  the  wall  of  the  cavity  is  switched  from 
hydrophobic  to  hydrophilic  as  the  result  of  the  rigid-body 
movements,  it  may  influence  the  released  polypeptide  to 
fold  in  this  space  because  burial  of  exposed  hydrophobic 
surfaces  and  exposure  of  hydrophilic  surfaces,  features  of 
the  native  state,  will  be  energetically  favored. 

Both  cryo-EM  reconstructions  [30]  and  high-resolution 
crystal  structures  have  resolved  the  rigid-body  domain 
movements  of  the  GroEL-GroES  machinery  itself  during 
the  reaction  cycle  [29,31]  (see  Figure  4).  In  addition,  there 
are  dynamic  fluorescence  and  kinetic  studies  indicating, 
respectively,  rapid  release  of  bound  polypeptide  into  the 
central  cavity  upon  ATP/GroES  binding  (ti/,  -*1  s)  and  pro¬ 
ductive  folding  inside  the  GroEL-GroES  cavity  [32-34]. 
However,  the  exact  effects  of  the  various  states  and  tran¬ 
sitions  of  the  GroEL-GroES  machinery  during  the 
reaction  cycle  on  the  conformation  of  polypeptide  sub¬ 
strates  are  not  well  understood  because,  as  ensembles  of 
unstable  non-native  states,  the  substrates  are  much  less 
accessible  to  structurarstudy,  particularly  in  the  presence 
of  the  megadalton  GroEL  ring  structure.  Thus,  our  ‘view’ 
of  what  is  happening  to  substrate  proteins  themselves  dur¬ 
ing  the  GroEL-GroES  reaction  is  poorly  resolved.  At  this 
point,  the  study  of  stringent  substrates,  which  are  depen¬ 
dent  on  the  complete  system  to  reach  their  native  form 
and  are  unable  to  productively  fold  without  it,  seems 
valuable  for  identifying  and  characterizing  the  full  range 
of  steps  in  the  reaction  that  are  critical  to  producing  the 
native  state.  Nevertheless,  there  can  also  be  value  to 
studying  nonstringent  substrates,  particularly  those  whose 
nonchaperoned  folding  is  well  described,  because  folding 
behavior  can  be  compared  in  the  presence  and  absence  ol 
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Figure  4 


Rigid-body  movements  of  a  GroEL  subunit 
attendant  to  ATP/GroES  binding.  Rigid-body 
rotations  about  the  top  and  bottom  of  the 
intermediate  domain  redirect  the  peptide¬ 
binding  surface  of  the  apical  domain, 
composed  of  helices  H  and  I  and  an 
underlying  extended  segment,  from  a  position 
facing  the  central  cavity  (lying  to  the  right  of 
the  subunit)  to  a  new  position  facing  out  of 
the  page.  The  binding  of  peptides  in  the 
groove  between  helices  H  and  I,  through 
contacts  with  resident  hydrophobic 
sidechains,  has  been  observed  (see  text). 
Although  the  involvement  of  the  extended 
segment  of  the  apical  domain  in  polypeptide 
binding  has  been  indicated  by  mutational 
studies,  a  structural  basis  for  such  interaction 
remains  undefined  (adapted  from  [29]). 


chaperonin.  Even  small  peptides  may,  to  some  extent, 
simulate  the  behavior  of  a  region  of  polypeptide  chain,  at 
least  in  binding  to  GroEL. 

Binding  to  GroEL  -  potential  unfolding  action 

There  are  definable  points  in  the  GroEL-GroES; reaction 
cycle  (Figure  5)  at  which  major  actions  on  polypeptide 
substrates  have  been  considered  likely  to  occur.  One  is  the 
step  of  polypeptide  binding  to  an  open  GroEL  ring 
(which,  under  physiological  conditions,  would  be  the  open 
ring  of  a  GroEL-GroES-ADP  asymmetric  complex)  [35] 
(see  Figure  5).  Binding  may  be  associated  with  at  least  par¬ 
tial  unfolding  of  a  substrate  protein,  which  is  potentially  a 
means  for  removing  a  non-native  form  from  a  kinetic  trap. 
This  could  occur  through  either  or  both  of  two  mecha¬ 
nisms,  one  catalytic,  in  which  GroEL  lowers  the  energy 
barriers  between  various  non-native  states,  the  other  ther¬ 
modynamic,  in  which  GroEL  preferentially  binds 
less-folded  states  without  affecting  the  transition  states 
between  the  various  conformations.  The  best  evidence  to 
date  for  a  catalytic  unfolding  action  associated  with  bind¬ 
ing  comes  from  a  hydrogen-deuterium  exchange 
experiment  showing  that  GroEL  in  catalytic  amounts  can 
globally  unfold  the  6  kDa  protein  barnase  [36].  Whether 
GroEL  can  exert  similar  effects  on  larger  proteins,  includ¬ 
ing  those  that  form  stable  binary  complexes  with  it, 
remains  unclear.  A  number  of  exchange  studies  carried  out 
with  stable  binary  complexes  of  such  proteins  as  a-lactal- 
bumin  [37],  human  dihydrofolate  reductase  [38,39]  and 
Rubisco  (ribulose-l,5-bisphosphate  carboxylase-oxyge¬ 
nase)  [40*]  indicate  that  these  proteins  do  not  become 
globally  exchanged  while  bound  to  GroEL,  exhibiting 
modest  levels  of  amide  proton  protection  that  are,  in  some 
cases,  localized  (but  see,  however,  [41,42],  which  showed 
that  cyclophilin  and  a  chemically  denatured  P-lactamase, 
respectively,  were  completely  exchanged  while  bound).  In 
the  case  of  Rubisco,  it  was  possible  to  examine  the  protein 


both  while  in  a  metastable  intermediate  state  in  solution 
and  after  becoming  bound  to  GroEL  [40*].  In  this  case,  a 
high  degree  of  protection  from  exchange  was  observed  for 
a  small  number  of  amide  protons  both  in  the  metastable 
intermediate  in  solution  and  in  the  binary  complex  with 
GroEL.  Thus,  whatever  the  nature  of  this  secondary  struc- 
ture(s),  it  appears  to  be  resistant  to  the  unfolding  action 
associated  with  GroEL  binding.  Some  proteins,  however, 
may  nevertheless  be  subject  to  catalyzed  unfolding  at  a 
local  level  during  the  process  of  binding  to  GroEL. 

The  thermodynamic  mechanism  for  unfolding  in  the  pres¬ 
ence  of  GroEL  involves  the  greater  affinity  of  GroEL  for 
less-folded  states  among  an  ensemble  of  conformers  that  are 
in  equilibrium  with  each  other  [43].  This  would  effectively 
shift  the  equilibrium  by  mass  action  toward  the  less-folded 
states.  Perhaps  the  best  evidence  supporting  an  action  of 
this  sort  comes  from  study  of  an  RNase  T1  mutant  that  pop¬ 
ulates  two  non-native  states,  one  more  structured  than  the 
other  [44].  In  the  presence  of  GroEL,  the  less-folded  state 
became  more  populated,  without  alteration  of  the  micro¬ 
scopic  rate  constants  between  the  two  states,  arguing  for  a 
thermodynamic  effect  (see  also  [42,45,46]  for  descriptions  of 
such  effects  on  P-lactamase,  dihydrofolate  reductase  and 
barstar).  Such  partitioning  between  non-native  states  has  yet 
to  be  demonstrated  for  stringent  substrates,  although  the 
ability  of  GroEL  to  inhibit  the  production  of  off-pathway 
aggregates  of  malate  dehydrogenase  (MDH)  has  been 
kinetically  modeled  to  such  a  mechanism.  In  the  model, 
GroEL  favors  binding  of  MDH  monomers  and  shifts  an 
equilibrium  of  low-order  aggregates  of  MDH  toward  this 
state  [47].  Clearly,  the  ability  to  resolve  different  conforma¬ 
tional  states  within  an  ensemble  of  substrate  proteins,  both 
unbound  and  GroEL-bound,  using  spectroscopic  tech¬ 
niques,  for  example,  will  be  necessary  to  better  characterize 
the  behavior  of  an  open  GroEL  ring  toward  its  substrates. 
Both  catalytic  and  thermodynamic  mechanisms  could  be 
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GroEL  GroES  reaction  cycle.  Non-native  polypeptide  is  bound  in  the 
open  (trans)  ring  of  an  asymmetric  GroEL-GroES-ADP  (D)  complex 
via  hydrophobic  Interactions  with  the  surrounding  apical  domains 
(panel  i),  Binding  of  ATP  (T)  and  GroES  to  the  same  ring  as  the 
po  ypeptide  produces  large  rigid-body  movements  in  the  subunits  of 
the  ring,  elevating  and  twisting  the  hydrophobic  binding  surface  away 
from  the  bound  polypeptide,  releasing  it  into  the  encapsulated  and 
now  hydrophilic  c/s  chamber  where  folding  commences  (panel  ii). 
After  8  10  s,  ATP  hydrolysis  occurs  in  the  seven  subunits  of  the 
folding-active  ring,  relaxing  the  affinity  of  the  ring  for  GroES  and 
priming’  it  for  release  (panel  iii).  At  the  same  time,  c/s  hydtolysis 
produces  an  allosteric  adjustment  of  the  trans  ring  that  allows  rapid 
entry  of  ATP  and  non-native  polypeptide  (panel  iv).  The  arrival  of  ATP 
t^rs  allosteric  dissociation  of  the  c/s  ligands  (panel  v);  the  binding 
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of  non-native  polypeptide  serves  to  accelerate  the  rate  of  this 
departure  by  30-50-fold.  Note  that  the  polypeptide  can  be  ejected  in 
either  a  native  form  (N),  a  form  committed  to  reaching  the  native  state 
in  the  bulk  solution  (IJ  or  an  uncommitted  non-native  state  (I  )  that 
can  be  rebound  by  chaperonin.  The  relatively  slow  binding  o^GroES 
to  the  new  ATP/polypeptide-bound  ring  orders  the  formation  of  the 
next  folding-active  GroEL-GroES  complex  (panel  v).  Thus,  GroEL 
aternates  rings  back  and  forth  as  folding-active,  expending  the  ATP  of 
one  ring  to  simultaneously  initiate  a  new  folding  reaction,  while 
dissociating  the  previous  one  from  the  opposite  ring.  As  discussed  in 
the  text,  polypeptide  binding  in  an  open  GroEL  ring  (panels  i  and  iv) 
may  be  associated  with  an  action  of  unfolding.  The  step  of 
ATP/GroES  binding  may  also  produce  forced  mechanical  unfolding 
(panels  ii  and  v). 


operative,  depending  on  the  particular  substrate  and  its 
position  on  the  landscape.  Finally,  although  the  binding  of 
substrate  proteins  is  usually  thought  of  as  redirecting  off- 
pathway  states,  there  seems  no  reason  to  exclude  that,  in  at 
least  some  cases,  GroEL  could  recognize  on-pathway  inter¬ 
mediates,  which  could  also  receive  kinetic  assistance  as  a 
result  of  recruitment  to  the  GroEL-GroES  cavity. 

Both  catalytic  and  thermodynamic  unfolding  mechanisms 
could  be  enabled  by  the  ability  of  the  multiple  surrounding 
GroEL  apical  domains  to  interact  with  a  substrate  protein, 
buch  multivalent  binding  was  recently  indicated  by  an 
experiment  with  covalent  GroEL  rings  bearing  various 
numbers  and  arrangements  of  binding-proficient  and  bind¬ 
ing-incompetent  apical  domains  [48*].  A  minimum  of  three 
consecutive  proficient  domains  was  required  for  efficient 
binding  of  a  stringent  substrate  protein.  In  agreement,  an 
accompanying  e.xperiment  employing  cysteine  cross-link- 
iiig  between  a  bound  substrate  protein  and  a  GroEL  ring 
obseived  cross-links  with  multiple  GroEL  apical  domains. 

Translating  binding  action  back  to  structure  - 
what  does  GroEL  recognize? 

Ultimately,  it  would  be  desirable  to  translate  the  foregoing 
actions  associated  with  chaperonin  binding  into  structural 
terms.  Lacking,  however,  any  high-resolution  information 


on  the  structure  of  a  substrate  protein  bound  to  GroEL, 
we  can  only  extrapolate  from  a  variety  of  different  types  of 
experimental  information,  which,  in  the  past  year,  has 
been  derived  from  proteomic,  biochemical,  spectroscopic 
and  crystallographic  studies.  At  the  level  of  binding  to 
individual  apical  domains,  a  crystallographic  study 
observed  that  a  dodecamer  peptide,  selected  for  its  high 
affinity  for  an  isolated  apical  domain,  associated  with  it  as 
a  P  hairpin,  both  in  a  co-crystal  with  an  isolated  apical 
domain  and  in  one  with  full  occupancy  of  the  apical 
domains  of  the  GroEL  tetradecamer  [49*].  In  these  struc¬ 
tures,  one  strand  of  the  hairpin  contacted  the  apical 
domain  at  a  position  between  the  two  a  helices  (H  and  I) 
facing  the  central  cavity  (see  Figure  4).  A  host  of 
hydrophobic  concaccs  were  formed  between  tryptophan 
and  phenylalanine  residues  in  the  peptide  and  hydropho¬ 
bic  sidechams  in  the  two  a  helices;  these  helices  had  been 
previously  implicated  in  polypeptide  binding  by  a  muta¬ 
genesis  study  [50]  and  by  a  previous  crystallographic  studv 
of  an  apical  domain  [51].  In  the  latter  study,  similar  topol¬ 
ogy  and  contacts  were  observed  between  an  extended 
N-terminal  tag  segment  of  one  monomer  found  Iving  in 
the  groove  between  these  two  a  helices  in  a  neighboring 
monomer  in  the  asymmetric  unit.  In  the  dodecamer  study 
it  was  additionally  noted  that,  compared  with  the  unoccu¬ 
pied  isolated  apical  domain  crystal  structure,  in  which  a 
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number  of  regions,  including  the  channel-facing  ones, 
were  found  to  differ  somewhat  in  positioning  between 
monomers  in  the  asymmetric  unit,  the  conformations  of 
the  isolated  domains  with  peptide  bound  became  virtually 
identical.  This  suggests  that  there  is  a  structural  plasticity 
to  the  apical  binding  surface  that  accommodates  the  vari¬ 
ety  of  substrates  and  that,  upon  contact  with  a  particular 
substrate,  optimizes  contacts  with  it. 

Lest  it  seem  that  only  p  strands  can  associate  with  the 
GroEL  apical  domain,  two  different  NMR  studies  re¬ 
examined  an  N-terminal  13-residue  peptide  from  the 
substrate  rhodanese  that  is  known  to  form  an  a  helix  in  the 
intact  native  protein.  This  peptide  had  been  observed 
through  transfer  NOE  effects  to  adopt  an  a-helical  struc¬ 
ture  upon  association  with  intact  GroEL  [52].  In  the  first  of 
the  new  studies,  the  same  transfer  NOE  effects  were 
observed  when  the  peptide  was  incubated  with  an  isolated 
GroEL  apical  domain,  and  chemical  shift  changes  could  be 
observed  that  localized  to  the  same  two  cavity-facing 
a  helices  (H  and  I)  [53].  In  the  second  study,  carried  out 
with  intact  GroEL,  D  and  D,L  chiral  forms  of  the  same  pep¬ 
tide  were  observed  to  bind  as  well  as  the  original  L  form 
[54].  Whereas  the  D  form  could  form  a  left-handed  helix  in 
TEE,  the  D,L  form  did  not  form  a  helix.  This  suggested 
that  the  hydrophobic  content  of  the  peptides  was  more 
critical  to  binding  than  adoption  of  a  particular  secondary 
structure.  Two  dodecameric  a-helical  peptides  with  the 
same  composition  were  also  compared,  observing  that  one 
with  hydrophobic  sidechains  clustered  on  one  side  of  the 
predicted  helix  opposite  hydrophilic  sidechains 
(amphiphilic  character)  bound  more  strongly  than  another 
peptide  interspersing  hydrophobic  sidechains  with 
hydrophilic  sidechains.  This  suggested  that  a  contiguous 
hydrophobic  surface  is  the  feature  in  a  substrate  favoring 
its  recruitment  to  GroEL.  In  a  third  study,  a  series  of 
14-residue  peptides  that  exhibited  a-helical  character  in 
solution  was  examined  [55].  In  this  case  also,  those  pep¬ 
tides  with  amphiphilic  character  were  found  to  bind  most 
strongly  to  GroEL,  some  with  submicromolar  affinity. 

Thus,  GroEL  appears  able  to  recognize  both  major  sec¬ 
ondary  structural  elements,  so  long  as  hydrophobic  surface 
is  presented.  It  remains  curious,  however,  that,  where 
examined,  recognition  appears  to  occur  through  the  same 
two  apical  a  helices  without  recognizable  participation  of 
an  underlying  extended  segment  (amino  acids  199-209; 
see  Figure  4)  that  also  bears  hydrophobic  residues,  muta¬ 
tion  of  which  abolishes  polypeptide  binding.  Thus,  the 
question  remains  as  to  whether  this  segment  participates 
directly  in  binding.  Notably,  the  H  and  I  a  helices  also 
form  the  major  contacts  with  the  GroES  mobile  loop 
(itself  in  an  extended  state),  also  through  hydrophobic 
interactions,  after  elevation  and  twisting  of  the  apical 
domains  [29]  (see  Figure  4).  Thus,  binding  through  these 
two  a  helices  may  be  an  energetically  favored  mode, 
although  polypeptide  and  GroES  binding  occur  at  two 
very  different  points  in  space. 


Both  major  secondary  structural  elements  figure  together 
in  a  proteomic  study  identifying  several  dozen  proteins 
from  Escherichia  colt  that  could  be  co-immunoprecipitaced 
with  anti-GroEL  antiserum  upon  cell  lysis  in  EDTA  (to 
inhibit  nucleotide-driven  dissociation)  [56].  Whether  any 
of  these  are  stringent  substrates,  that  is,  dependent  on 
GroEL-GroES  for  proper  folding,  remains  to  be  seen,  but 
of  this  collective  of  bound  species,  where  a  structure  of  the 
native  form  was  available,  the  topology  favored  was  ap, 
with  two  or  more  domains.  Thus,  it  seems ‘plausible  that 
GroEL  multivalently  binds  individual  a  and  p  units 
through  exposed  hydrophobic  aspects  that  will  be  buried 
together  in  the  native  state.  This  potentially  stabilizes  the 
individual  domains  against  inappropriate  intermolecular  or 
even  intramolecular  interactions  until  ATP/GroES-driven 
release  directs  an  optimal  chance  for  correct  association 
within  the  molecule,  while  it  is  confined  to  the  cis  cavity.  A 
direct  illustration  of  such  putative  action  comes  from  a 
study  of  the  folding  of  four-disulfide  hen  lysozyme,  com¬ 
posed  of  an  a  and  p  domain,  in  the  presence  of  GroEL 
[57].  The  open  GroEL  ring  accelerated  the  rate  of  acquisi¬ 
tion  of  the  native  state  by  1.3-fold,  without  affecting  the 
rate  or  mechanism  of  domain  folding.  Rather,  GroEL 
accelerated  the  slower  step  of  proper  docking  of  the  two 
domains,  presumably  by  binding  one  or  both  individual 
domains  and  disfavoring  or  reversing  non-native  contacts. 

ATP/GroES-driven  release  of  GroEL-bound 
substrate  into  the  central  cavity  -  potential 
unfolding  action 

The  action  of  ATP/GroES  binding  on  polypeptide  confor¬ 
mation,  associated  with  release  into  the  GroEL-GroES 
cavity,  has  been  of  major  interest.  An  earlier  study  of  the 
substrate  Rubisco,  examining  its  tryptophan  fluorescence 
anisotropy,  observed  a  rapid  drop  (ti/^  -1  s),  followed  by  a 
slow  rise  correlating  with  production  of  the  native  state 
[32].  The  nature  of  the  fast  phase  had  been  a  mystery,  but 
an  exchange  experiment  with  tritium-labeled  Rubisco  has 
begun  to  address  this  [40*].  A  metastable  intermediate  of 
this  protein  exhibited  12  highly  protected  amide  tritiums 
both  in  solution  and  while  bound  to  GroEL.  When  ATP 
and  GroES  were  added,  all  but  two  of  the  tritiums  were 
exchanged  by  5  s,  the  earliest  time  examined.  The  eleva¬ 
tion  and  twisting  of  the  apical  domains,  driven  by 
ATP/GroES  binding  to  a  polypeptide-bound  ring,  were 
proposed  to  produce  a  stretching  of  substrate  between  the 
apical  domains  before  complete  release  into  the  cavity. 
Such  a  mechanism  would  couple  the  energy  of 
ATP/GroES  binding  to  a  forced  unfolding  action.  But  the 
deprotection  observed  does  not  seem  fully  accountable 
only  by  a  stretching  action  exerted  on  molecules  becoming 
encapsulated  in  the  cis  ring.  Consider  the  experimental 
observation  that  GroES  binds  randomly  to  either  of  the 
two  GroEL  rings  of  a  Rubisco-GroEL  binary  complex  to 
form  two  different  asymmetric  complexes:  approximately 
50%  cis  ternary  complexes  and  approximately  50%  leans 
ternary  complexes,  the  latter  with  GroES  on  the  ring  oppo¬ 
site  the  polypeptide-bound  one.  Thus,  one  would  expect 
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that,  ac  a  cime  (here,  5  s)  less  than  that  of  a  single  turnover 
('~10s),  only  about  half  of  the  tritiums  should  be  depro- 
tected,  corresponding  to  those  of  the  Rubisco  molecules 
chat  had  become  encapsulated  in  cis.  Yet  nearly  all  were 
deprotected,  suggesting  that  molecules  in  the  trans  ring 
must  likewise  have  been  deprotected.  Previous  studies 
have  indicated  that  the  trans  ring  of  a  cis  complex  in  ATP 
has  no  significant  affinity  for  non-native  Rubisco  [35],  thus 
suggesting  that  any  deprotection  of  Rubisco  bound  on  that 
ring  must  be  associated  with  .its  release  into  the  bulk  solu¬ 
tion.  Perhaps  there  is  also  a  twisting  action  on  that  ring, 
attended  by  unfolding  during  release.  Alternatively,  simple 
release  without  unfolding  may  be  sufficient  to  produce 
deprotection  if,  for  example,  the  protection  derives  from 
association  of  the  substrate  with  the  GroEL  cavity  wall 
(either  through  direct  hydrogen  bond  formation  or  via 
steric  shielding  of  amide  protons).  Thus,  more  needs  to  be 
learned  about  whether  forced  unfolding  is  really  occurring 
in  this  case,  whether  it  is  a  general  aspect  of  the  chaperonin 
mechanism  and  whether  substrate  polypeptides  bound  in 
trans  are  somehow  also  affected.  Furthermore,  it  remains 
to  be  demonstrated  whether  such  an  action  is  required  for 
productive  Rubisco  folding. 

In  a  further  experiment,  the  kinetics  of  tritium  exchange  of 
the  metastable  Rubisco  intermediate  was  examined  in 
the  presence  of  substoichiometric  concentrations  of 
GroEL-GroES.  The  observed  rate  of  decay  indicated  that 
molecules  whose  tritiums  had  been  exchanged  were  sub¬ 
sequently  being  released  from  cis  complexes  in  non-native 
forms  that  competed  with  the  remaining  pool  of  still  tri¬ 
tium-labeled  Rubisco  molecules  for  binding  to  GroEL 
[40*].  This  reflects,  as  established  by  earlier  studies,  the 
occurrence  of  multiple  rounds  of  binding  and  release  of 
non-native  polypeptide  from  GroEL  during  a  productive 
folding  reaction,  underscoring  the  trial-and-error  process  of 
achieving  the  native  state,  as  opposed  to  a  process  in  which 
non-native  forms  remain  at  GroEL  until  productive  fold¬ 
ing  occurs  (see  Figure  5).  Indeed,  in  a  stoichiometric 
reaction,  only  a  few  percent  of  Rubisco  molecules  reach 
native  form  in  what  corresponds  to  any  given  round  of  fold¬ 
ing  at  chaperonin.  Addition  of  ‘trap’  versions  of  GroEL, 
able  to  bind  but  not  release  non-native  forms,  rapidly  halts 
a  reaction,  with  non-native  substrate  physically  accumulated 
at  the  trap  (e.g.  [58,59]).  Such  observations  also  reflect  on 
the  model  for  forced  unfolding,  indicating  that,  in  and  of 
itself,  even  if  it  occurs,  such  an  action  is  not  sufficient  for 
producing  the  native  state;  otherwise,  multiple  rounds 
would  not  be  required. 

By  contrast,  when  a  stable,  long-lived  (>100  min)  cis  com¬ 
plex  is  formed  between  SRI,  the  single-ring  version  of 
GroEL,  and  GroES,  it  produces  nearly  100%  recovery  of 
native  Rubisco  inside  the  cis  cavity.  This  indicates  a  major 
role,  if  not  a  dominant  one,  for  the  encapsulated  cis  space  in 
producing  the  native  state  (see  also  [60,61]).  Furthermore, 
as  suggested  by  kinetic  studies  with  MDH,  non-native 
molecules  expelled  into  the  bulk  solution  during  a  normal 


folding  reaction  with  wild-type  GroEL  (where  the  lifetime 
of  a  cis  complex  is  ~10  s)  can  form  low-order  aggregates  on 
a  short  time-scale  [47],  in  part  explaining  why  such  released 
forms  fail  to  achieve  the  native  state  in  the  bulk  solution.  In 
contrast,  MDH  molecules  held  in  a  stable  cis  complex 
(inside  SRl-GroES)  are  forestalled  from  such  aggregation 
and  are  productively  folded  essentially  quantitatively  [^2]. 

Productive  folding  in  the  GroEL-GroES  cavity 

Although  features  of  the  GroEL-GroES  cavity  that  favor 
productive  folding  have  been  identified  from  ciy^stallo- 
graphic  study,  the  path  inside  it  that  a  protein  takes  to  the 
native  state  is  unknown.  Does  this  chamber  simulate  an 
infinite  dilution  condition.^  Perhaps  it  can  for  smaller 
polypeptides,  but  the  physical  dimensions  argue  for  close 
confinement  of  larger  substrates  like  Rubisco. 
Experimentally,  even  in  its  native  state,  the  smaller  pro¬ 
tein  GFP  appeared  to  be  tumbling  into  the  walls  of  this 
space,  with  a  rotational  correlation  time  of  42  ns,  instead  of 
the  12  ns  observed  in  solution  [33].  Perhaps  such  confine¬ 
ment  presents  limits  to  the  conformational  space  that  can 
be  explored  by  non-native  forms,  limiting  their  folding  tra¬ 
jectory.  Clearly,  a  comparison  of  folding  in  this  cis  cavity 
with  folding  at  infinite  dilution  would  be  instructive  and 
might  be  possible  using  single-molecule  techniques. 

Conclusions 

In  sum,  then,  for  both  the  folding  of  small  two-state  folding 
proteins  and  chaperonin  action  on  larger  ones,  some  basic 
outlines  of  mechanism  are  now  available.  Yet  it  seems  likely 
that  there  will  be  still  other  basic  mechanistic  principles 
concerning  these  reactions  that  lie  as  yet  unrecognized. 
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